Wednesday, 19 September 2012

Neural Net Classifier Architecture Testing

Having successfully integrated the FANN library I've changed my mind as to the design of my neural net classifier. Previously I had planned to train a series of classifiers, each "tuned" to a specific measured period in the data, and each to discriminate between 5 distinct market types - my normal cyclic, uwr, unr, dwr and dnr as I've talked about in earlier posts.

Now, however, I'm working on a revised design that incorporates elements of a decision tree, where neural nets sit at interior nodes of the tree. A simple schematic is shown below
The advantage of this model is that I only have to train 5 neural nets, represented by the 5 colours in the above schematic, and each net is a binary classifier (-1 and 1) with only one node in its output layer, with a decision rule of > 0 or < 0 for the activation function. Rather than have one neural net per period, the period is now one of the features in the features input vector.

At the moment the features vector has a length of 48, and as I write this my computer is churning through a cross validation test (using the FANN library) to determine the optimum number of nodes for the single hidden layer(s) of the neural nets. Once this is complete, I plan to run another series of cross validation tests to determine the optimum number of epochs to run during training. When these are complete I shall then run one final cross validation test to determine the optimum lambda for a regularisation term. This last set of tests will use the Octave code previously used because, as far as I can see, FANN library functions do not appear to have this capability.

One slight change to this Octave code will be to use the Hyberbolic tangent as the activation function (see Y. LeCun 1998, available as paper 86 here). I may also try to implement some other tricks recommended in this paper.

Wednesday, 5 September 2012

Successful Integration of FANN

I am pleased to say that all my recent work seems to have borne fruit, and I have now managed to code up a training and testing routine in Octave that uses the FANN library and its Octave bindings. I think that this has been some of my most challenging coding work up to now, and required many hours of research on the web and forum help to complete.

I find that one frustration with using open source software is the sparse and sometimes non-existent documentation and this blog post is partly intended as a guide for those readers who may also wish to use FANN in Octave. The code in the code box below is roughly divided into these sections
  • Octave code to index into and extract the relevant data from previously saved files
  • a section that uses Perl to format this data
  • the Octave binding code that actually implements the FANN library functions to set up and train a NN
  • a short bit of code to save and then test the NN on the training data
As the code itself is heavily commented no further comment is required.
% load training_data_1.mat on command line before running this script.

clear exclusive -X -accurate_period -y

yy = eye(5)(y,:) ; % using training labels y, create an output vector suitable for NN training

period = input('Enter period of interest: ') ;

%for period = 10:50

fprintf('\nTraining for ANN period: %f\n', period ) ;

% This first switch control block creates the training data by indexing, by period, into them
% data loaded from training_data_1.mat
switch (period)

case 10

% index using input period
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , : ) ;
% and same for market labels vector y
y_train = yy( [i_X] , : ) ;

% now index using input period plus 1 for test set
[i_X j_X] = find( accurate_period(:,1) == period+1 ) ;
% extract the relevant part of X using above i_X index
X_test = X( [i_X] , : ) ;
y_test = yy( [i_X] , : ) ;

train_data = [ X_train y_train ] ;
test_data = [ X_test y_test ] ;
detect_optima = train_data( (60:60:9000) , : ) ;

case 50

% index using input period
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , : ) ;
% and same for market labels vector y
y_train = yy( [i_X] , : ) ;

% now index using input period minus 1 for test set
[i_X j_X] = find( accurate_period(:,1) == period-1 ) ;
% extract the relevant part of X using above i_X index
X_test = X( [i_X] , : ) ;
y_test = yy( [i_X] , : ) ;

train_data = [ X_train y_train ] ;
test_data = [ X_test y_test ] ;
detect_optima = train_data( (60:60:9000) , : ) ;

otherwise

% index using input period
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , : ) ;
% and same for market labels vector y
y_train = yy( [i_X] , : ) ;

% now index using input period minus 1 for test set
[i_X j_X] = find( accurate_period(:,1) == period-1 ) ;
% extract the relevant part of X using above i_X index
X_test_1 = X( [i_X] , : ) ;
% and take every other value
X_test_1 = X_test_1( (2:2:9000) , : ) ;
% and same for market labels vector y
y_test_1 = yy( [i_X] , : ) ;
% and take every other value
y_test_1 = y_test_1( (2:2:9000) , : ) ;

% now index using input period plus 1 for test set
[i_X j_X] = find( accurate_period(:,1) == period+1 ) ;
% extract the relevant part of X using above i_X index
X_test_2 = X( [i_X] , : ) ;
% and take every other value
X_test_2 = X_test_2( (2:2:9000) , : ) ;
% and same for market labels vector y
y_test_2 = yy( [i_X] , : ) ;
% and take every other value
y_test_2 = y_test_2( (2:2:9000) , : ) ;

train_data = [ X_train y_train ] ;
test_data = [ [ X_test_1 y_test_1 ] ; [ X_test_2 y_test_2 ] ] ;
detect_optima = train_data( (60:60:9000) , : ) ;

endswitch % end of training data indexing switch

% now write this selected period data to -ascii files
save data_for_training -ascii train_data
save data_for_testing -ascii test_data
save detect_optima -ascii detect_optima % for use in Fanntool software

%************************************************************************
% Now the FANN training code !                                          *
%************************************************************************

% First set the parameters for the FANN structure
No_of_input_layer_nodes = 102 
No_of_hidden_layer_nodes = 102 
No_of_output_layer_nodes = 5 
Total_no_of_layers = length( [ No_of_input_layer_nodes No_of_hidden_layer_nodes No_of_output_layer_nodes ] )

% save and write this FANN structure info and length of training data file into an -ascii file - "train_nn_from_this_file"
fid = fopen( 'train_nn_from_this_file' , 'w' ) ;
fprintf( fid , ' %i %i %i\n ' , length(train_data) , No_of_input_layer_nodes , No_of_output_layer_nodes ) ;
fclose(fid) ;

% now create the FANN formatted training file - "train_nn_from_this_file"
system( "perl perl_file_manipulate.pl >train_nn_from_this_file" ) ;

%{
The above call to "system" interupts, or pauses, Octave at this point. Now the "shell" or "bash"
takes over and calls a Perl script, "perl_file_manipulate.pl", with the command line arguments
">train_nn_from_this_file", where < indicates that the file "data_for_training"
is to be read by the Perl script and >> indicates that the file "train_nn_from_this_file" is to be 
appended by the Perl script. From the fopen and fclose operations above the file to be appended contains only 
FANN structure info, e.g. 9000 102 5 on one line, and the file that is to be read is the training data of NN features 
and outputs extracted by the switch control structure above and written to -ascii files. The contents of the Perl
script file are:

#!/usr/bin/env perl

while (<>) { 
   my @f = split ;
   print("@f[0..$#f-5]\n@f[-5..-1]\n") ;
}

After these Perl operations the file "train_nn_from_this_file" is correctly formatted for the FANN library calls that
are to come
e.g. the file looks like this:-

9000 102 5
-2.50350699e-09 -2.52301858e-09 -2.50273727e-09 -2.44301942e-09 -2.34482961e-09 -2.20974520e-09
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00
etc.

When all this Perl script stuff is finished control returns to Octave.
%}

%***************************************************************************
% Begin FANN training ! Hurrah !                                           *
%***************************************************************************
% create the FANN
ANN = fann_create( [ No_of_input_layer_nodes No_of_hidden_layer_nodes No_of_output_layer_nodes ] ) ;

% create the parameters for training the FANN in an Octave "struct." All parameters are explicitly stated and set to the
% the default values. If not explicitly stated they would be these values anyway, but are explicitly stated just to show 
% how this is done 
NN_PARAMS = struct( "TrainingAlgorithm", 'rprop', "LearningRate", 0.7, "ActivationHidden", 'Sigmoid', "ActivationOutput", 'Sigmoid',...
"ActivationSteepnessHidden", 0.5, "ActivationSteepnessOutput", 0.5, "TrainErrorFunction", 'TanH', "QuickPropDecay", -0.0001,...
"QuickPropMu", 1.75, "RPropIncreaseFactor", 1.2, "RPropDecreaseFactor", 0.5, "RPropDeltaMin", 0.0, "RPropDeltaMax", 50.0 )

% and then set the parameters
fann_set_parameters( ANN , NN_PARAMS ) ;

% now train the FANN on data contained in file "train_nn_from_this_file"
fann_train( ANN, 'train_nn_from_this_file', 'MaxIterations', 200, 'DesiredError', 0.001, 'IterationsBetweenReports', 10 )

% save the trained FANN in a file e.g. "ann_25.net"
fann_save( ANN , [ "ann_" num2str(period) ".net" ] )

% Now test the ANN on the test_data set
% create ANN from saved fann_save file
ANN = fann_create( [ "ann_" num2str(period) ".net" ] ) ;

% run the trained ANN on the original feature training set, X_train
X_train_FANN_results = fann_run( ANN , X_train ) ;

% convert the X_train_FANN_results matrix to a single prediction vector
[dummy, prediction] = max( X_train_FANN_results, [], 2 ) ;

% compare accuracy of this NN prediction vector with the known labels in y for this period and display 
[i_X j_X] = find( accurate_period(:,1) == period ) ;
fprintf('\nTraining Set Accuracy: %f\n', mean( double( prediction == y([i_X],:) ) ) * 100 ) ;
fprintf('End of training for ANN period: %f\n', period ) ;

%end % end of period for loop
Typical terminal output during the running of this code looks like this:

octave:143> net_train_octave
Enter period of interest: 25
Max epochs      200. Desired error: 0.0010000000.
Epochs            1. Current error: 0.2537834346. Bit fail 45000.
Epochs           10. Current error: 0.1802092344. Bit fail 20947.
Epochs           20. Current error: 0.0793143436. Bit fail 7380.
Epochs           30. Current error: 0.0403240845. Bit fail 5215.
Epochs           40. Current error: 0.0254898760. Bit fail 2853.
Epochs           50. Current error: 0.0180807728. Bit fail 1611.
Epochs           60. Current error: 0.0150692556. Bit fail 1414.
Epochs           70. Current error: 0.0119200321. Bit fail 1187.
Epochs           80. Current error: 0.0091521516. Bit fail 937.
Epochs           90. Current error: 0.0073408978. Bit fail 670.
Epochs          100. Current error: 0.0060765576. Bit fail 492.
Epochs          110. Current error: 0.0051601632. Bit fail 446.
Epochs          120. Current error: 0.0041675218. Bit fail 386.
Epochs          130. Current error: 0.0036309268. Bit fail 374.
Epochs          140. Current error: 0.0032380833. Bit fail 343.
Epochs          150. Current error: 0.0028855132. Bit fail 302.
Epochs          160. Current error: 0.0025165526. Bit fail 280.
Epochs          170. Current error: 0.0022868335. Bit fail 253.
Epochs          180. Current error: 0.0021089041. Bit fail 220.
Epochs          190. Current error: 0.0019043182. Bit fail 197.
Epochs          200. Current error: 0.0017739790. Bit fail 169.

Training for ANN period: 25.000000
No_of_input_layer_nodes =  102
No_of_hidden_layer_nodes =  102
No_of_output_layer_nodes =  5
Total_no_of_layers =  3
NN_PARAMS =

  scalar structure containing the fields:

    TrainingAlgorithm = rprop
    LearningRate =  0.70000
    ActivationHidden = Sigmoid
    ActivationOutput = Sigmoid
    ActivationSteepnessHidden =  0.50000
    ActivationSteepnessOutput =  0.50000
    TrainErrorFunction = TanH
    QuickPropDecay = -1.0000e-04
    QuickPropMu =  1.7500
    RPropIncreaseFactor =  1.2000
    RPropDecreaseFactor =  0.50000
    RPropDeltaMin = 0
    RPropDeltaMax =  50

Training Set Accuracy: 100.000000
End of training for ANN period: 25.000000

The accuracy obtained on all periods from 10 to 50 is at least 98%, with about two thirds being 100%. However, the point of this post is not to show results of any one set of NN features or training parameters, but rather that I can now be more productive by using the speed and flexibility of FANN in the development of my NN market classifier.

Wednesday, 15 August 2012

Progress Report on Neural Net

Well, reducing the number of nodes in the hidden layer didn't help much; if anything it made things look slightly more erratic. As a result I decided to increase the number of input features to 102, which gave much more pleasing results. A screen shot of this newer NN, in the bottom pane, is shown below
Comparing this with the earlier version shown in my previous post, for example by looking at the smooth uptrend in the middle, it can be seen that there are far fewer "false" market types indicated - a definite improvement. The moral seems to be that adding more informative features is the way to go.

However, this raises the problem of training time - it took about 30 hours to train this model using my current Octave scripts - which is far too long for me. Due to this I have decided to use the FANN library, fanntool and the octave-fann bindings for my future development of NNs. I've recently been playing around with these and I think that, in the long run, a lot of time will be saved, even though I will have to write a certain amount of "glue code" to achieve what I want. The above 102 input feature NN will be my reserve NN in the event that I can't get the FANN library, fanntool and octave-fann to work to my satisfaction.

Thursday, 2 August 2012

Results of Comparative Cross Validation Tests

As expected the NN achieved 100 % accuracy and my prediction of 20 % to 30 % accuracy for my current Naive Bayesian Classifier was more or less right - in various runs of sample sizes up to 50,000 it achieved accuracy rates of 30 % to 33 %. A screen shot of both classifiers applied to the last 200 days worth of S & P futures prices is shown below, with the Naive Bayesian in the upper pane and the NN in the lower pane.
However, despite it vastly superior performance in the tests, I don't really like the look of the NN on real data - it appears to be more erratic or noisier than the Bayesian classifier. I suspect that the NN may be overly complex, with 54 nodes in its one hidden layer. I shall try to improve the NN by reducing the number of hidden layer nodes to 25, and then seeing how that looks on real data.

Tuesday, 31 July 2012

Successful Completion of Neural Net Cross Validation Tests

In my last post I suggested that I was unsure of my coding of the cross validation test I had written so what I have done is take a new coding approach and completely rewritten the test, which I'm happy to say has been very successful. Using this newly coded implementation the out of sample accuracy of the trained neural nets is 100 %. As before, these tests were run overnight, but this time for a total of 2,400,000 separate test examples due to increased code efficiency.

The next test I'm going to code, more out of curiosity than anything else, is a concurrent cross validation test to test both my new neural net classifier algorithm and my Naive Bayesian Classifier together. I expect the NN to again obtain results similar to the above, but anticipate that the Naive Bayesian Classifier will perform quite poorly, achieving between 20 % to 30 % accuracy. I expect such low performance simply because the Naive Bayesian Classifier was developed using just 5 exemplar market type examples compared to 25 for the NN.

Friday, 20 July 2012

Neural Net Cross Validation Tests Completed

These tests were conducted by looping over a series of replicated "idealised" market types; in each iteration cyclic component amplitudes were randomly chosen to range between 1 and 25 and phase shifts were randomly chosen such that the phase shifts that appear in the training set markets do not also appear in these cross validation sets of markets. For each combination of the above one of 25 possible market type changes was also randomly applied and then the relevant feature vector for each iteration was extracted. These tests were run overnight for a total of 1,200,000 separate, iterated test examples. The results are shown below.

Complete Accuracy percentage: 33.574500

"Acceptable" mis-classifications percentages
Predicted = uwr & actual = unr: 5.083417
Predicted = unr & actual = uwr: 7.230083
Predicted = dwr & actual = dnr: 5.170667
Predicted = dnr & actual = dwr: 7.180167
Predicted = uwr & actual = cyc: 3.287917
Predicted = dwr & actual = cyc: 7.180167
Predicted = cyc & actual = uwr: 3.623167
Predicted = cyc & actual = dwr: 3.554333

Dubious, difficult to trade mis-classification percentages
Predicted = uwr & actual = dwr: 2.432667
Predicted = unr & actual = dwr: 2.432667
Predicted = dwr & actual = uwr: 2.351500
Predicted = dnr & actual = uwr: 2.351500

Completely wrong classifications percentages
Predicted = unr & actual = dnr: 0.210083
Predicted = dnr & actual = unr: 0.207333

The complete accuracy percentage requires no comment. The "acceptable" mis-classifications are situations in which the erroneous prediction would not have one trading in a manner that would be inconsistent with the actual state of the market i.e. a predicted uwr and actual cyc is a situation where the market is predicted to be trending upwards with 50% retracements, but in actual fact is trending sideways in a cyclic manner. In either case one might be tempted to trade the swings of the market, so the mis-classification is acceptable because the erroneous prediction would still have you trading in a manner suitable to the "true" situation.

The "Dubious, difficult to trade" mis-classifications are where the above does not apply, i.e. attempting to swing trade in a bullish manner when in fact the market is trending down. One might get lucky and extract some profit, but in all probability the net expectation would be to make a loss. The completely wrong classifications again require no comment. The above totals of percentages do not add up to 100 because some combinations of mis-classifications are not included in this summary.

I'm not overwhelmed by these results, and so I shall continue to extend the features vector with more informative features to hopefully improve future cross validation test results. Also, I'm not 100% sure that my test implementation code is doing what I think it is doing, so that needs checking too.

On a related note, I've just enrolled in another online course, this time devoted entirely to neural nets.

Wednesday, 18 July 2012

Neural Net Training Completed

I am pleased to say that I have now completed the training of my NN market type classifier.

In an earlier post I mentioned that I had constructed a training set of 324,000 training examples to train the NN on. However, my first attempt at using this in its entirety wasn't successful, with an accuracy on the training set of between 52 % to 58 %. What's more, one training "session" lasted approximately 24 hours, with only 50 calls to the fmincg.m function ( a Java implementation is available from here ), and this would need to be repeated many times. This wasn't a practical proposition and I began to think about ways in which I could speed up the training process. One possible solution was to use other software and in my search of the internet I discovered the FANN library and the Fanntool GUI. After a close reading of the manuals I decided that for my purposes this wasn't the route I wanted to take, but in the future I may come back to this, particularly since the library has bindings to Octave.

After some consideration I decided to split the training set into smaller sets, with the intention of training numerous NNs, each trained to classify a market with a given period, and then to index into the relevant NN in a manner similar to that used in my brute force similarity classifier. The code for this training session is shown below.
% first, training data "training_data.mat" should be loaded in command line

clear -exclusive X y accurate_period % clear everything except y and X, previously loaded from the command line

% ************************************************************************
% Comment out the non relevant preprocessing step for the test in question
% ************************************************************************
% use X as it is for X_train
X_train = X ;
% ************************************************************************
% change zeros in X into -1 for X_train
%X_train = X ;
%change = X_train(:,4:end) ;
%change( change == 0 ) = -1 ;
%X_train(:,4:end) = change ;
%*************************************************************************
% train on just one period's features in X
% index into training set based on period measurement

% create final matrices for storing all unrolled Theta1 and Theta2 and cost record
all_ur_Theta1 = zeros(2862,288) ;
all_ur_Theta2 = zeros(270,288) ;
cost_record = zeros(288,4) ;
col_count = 1 ;

for period = 15:50 
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , 2:54 ) ;
% and same for market labels vector y
y_train = y( [i_X] , 1 ) ;
% ************************************************************************

%% Setup the parameter sizes 
input_layer_size = size(X_train,2) ;   % the number of features ( columns ) in X_train
hidden_layer_size = size(X_train,2) ;  % original was 25 hidden units
num_labels = 5 ;                 % 5 labels, from 1 to 5  
                                 % 1=uwr 2=unr 3=dwr 4=dnr 5=cyc

for lambda = [ 0.01 0.03 0.1 0.3 1 3 10 30 ]

% Initializing Neural Network Parameters
initial_Theta1 = randInitializeWeights( input_layer_size , hidden_layer_size ) ;
initial_Theta2 = randInitializeWeights( hidden_layer_size , num_labels ) ;

% Unroll parameters
initial_nn_params = [ initial_Theta1(:) ; initial_Theta2(:) ] ;

%% =================== Training NN ===================
%  To train the neural network, we will now use "fmincg", which
%  is a function which works similarly to "fminunc". Recall that these
%  advanced optimizers are able to train our cost functions efficiently as
%  long as we provide them with the gradient computations.
%
fprintf( '\nTraining Neural Network... \n' )

%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset( 'MaxIter' , 200 ) ; % original was 50

% try different values of lambda
%lambda = 0.1 ;

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction( p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X_train, y_train, lambda ) ;

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[ nn_params , cost ] = fmincg( costFunction , initial_nn_params , options ) ;

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape( nn_params( 1:hidden_layer_size * (input_layer_size + 1) ) , ...
                 hidden_layer_size , (input_layer_size + 1) ) ;

Theta2 = reshape( nn_params( (1 + (hidden_layer_size * (input_layer_size + 1))):end ) , ...
                 num_labels , (hidden_layer_size + 1) ) ;

%% ================= Implement Predict =================
%  After training the neural network, we would like to use it to predict
%  the labels. You will now implement the "predict" function to use the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict( Theta1 , Theta2 , X_train ) ;
training_set_accuracy = mean( double(pred == y_train) ) * 100.0 ;

fprintf( 'Training Set Accuracy: %f\n' , training_set_accuracy ) ;
fprintf( 'for lambda value of: %f\n' , lambda ) ;
fprintf( 'and period: %f\n' , period ) ;

% write to all_ur_Theta1 & all_ur_Theta2 & cost record
all_ur_Theta1(:,col_count) = Theta1(:) ;
all_ur_Theta2(:,col_count) = Theta2(:) ;
cost_record(col_count,1) = period ;
cost_record(col_count,2) = lambda ;
cost_record(col_count,3) = training_set_accuracy ;
cost_record(col_count,4) = cost(end) ;
col_count = col_count + 1 ;

end % lambda loop

end % period loop

save -binary all_ur_Thetas.mat all_ur_Theta1 all_ur_Theta2 cost_record
With 200 calls to the fmincg.m function this took an overnight run to complete, but in the morning I had extremely good results. For every period there was a trained NN that obtained 100 % accuracy. In fact for most periods there were several values for lambda ( a regularisation term to avoid over-fitting ) that gave 100 % accuracy, in which case I took the NN that had the lowest cost for 100 % accuracy.

So now I have a set of trained NNs, and the next step will be to test them on a cross validation set of my normal "ideal" market types, which will be the subject of my next post.