## Wednesday, 18 July 2012

### Neural Net Training Completed

I am pleased to say that I have now completed the training of my NN market type classifier.

In an earlier post I mentioned that I had constructed a training set of 324,000 training examples to train the NN on. However, my first attempt at using this in its entirety wasn't successful, with an accuracy on the training set of between 52 % to 58 %. What's more, one training "session" lasted approximately 24 hours, with only 50 calls to the fmincg.m function ( a Java implementation is available from here ), and this would need to be repeated many times. This wasn't a practical proposition and I began to think about ways in which I could speed up the training process. One possible solution was to use other software and in my search of the internet I discovered the FANN library and the Fanntool GUI. After a close reading of the manuals I decided that for my purposes this wasn't the route I wanted to take, but in the future I may come back to this, particularly since the library has bindings to Octave.

After some consideration I decided to split the training set into smaller sets, with the intention of training numerous NNs, each trained to classify a market with a given period, and then to index into the relevant NN in a manner similar to that used in my brute force similarity classifier. The code for this training session is shown below.
% first, training data "training_data.mat" should be loaded in command line

clear -exclusive X y accurate_period % clear everything except y and X, previously loaded from the command line

% ************************************************************************
% Comment out the non relevant preprocessing step for the test in question
% ************************************************************************
% use X as it is for X_train
X_train = X ;
% ************************************************************************
% change zeros in X into -1 for X_train
%X_train = X ;
%change = X_train(:,4:end) ;
%change( change == 0 ) = -1 ;
%X_train(:,4:end) = change ;
%*************************************************************************
% train on just one period's features in X
% index into training set based on period measurement

% create final matrices for storing all unrolled Theta1 and Theta2 and cost record
all_ur_Theta1 = zeros(2862,288) ;
all_ur_Theta2 = zeros(270,288) ;
cost_record = zeros(288,4) ;
col_count = 1 ;

for period = 15:50
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , 2:54 ) ;
% and same for market labels vector y
y_train = y( [i_X] , 1 ) ;
% ************************************************************************

%% Setup the parameter sizes
input_layer_size = size(X_train,2) ;   % the number of features ( columns ) in X_train
hidden_layer_size = size(X_train,2) ;  % original was 25 hidden units
num_labels = 5 ;                 % 5 labels, from 1 to 5
% 1=uwr 2=unr 3=dwr 4=dnr 5=cyc

for lambda = [ 0.01 0.03 0.1 0.3 1 3 10 30 ]

% Initializing Neural Network Parameters
initial_Theta1 = randInitializeWeights( input_layer_size , hidden_layer_size ) ;
initial_Theta2 = randInitializeWeights( hidden_layer_size , num_labels ) ;

% Unroll parameters
initial_nn_params = [ initial_Theta1(:) ; initial_Theta2(:) ] ;

%% =================== Training NN ===================
%  To train the neural network, we will now use "fmincg", which
%  is a function which works similarly to "fminunc". Recall that these
%  advanced optimizers are able to train our cost functions efficiently as
%  long as we provide them with the gradient computations.
%
fprintf( '\nTraining Neural Network... \n' )

%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset( 'MaxIter' , 200 ) ; % original was 50

% try different values of lambda
%lambda = 0.1 ;

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction( p, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, X_train, y_train, lambda ) ;

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[ nn_params , cost ] = fmincg( costFunction , initial_nn_params , options ) ;

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape( nn_params( 1:hidden_layer_size * (input_layer_size + 1) ) , ...
hidden_layer_size , (input_layer_size + 1) ) ;

Theta2 = reshape( nn_params( (1 + (hidden_layer_size * (input_layer_size + 1))):end ) , ...
num_labels , (hidden_layer_size + 1) ) ;

%% ================= Implement Predict =================
%  After training the neural network, we would like to use it to predict
%  the labels. You will now implement the "predict" function to use the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict( Theta1 , Theta2 , X_train ) ;
training_set_accuracy = mean( double(pred == y_train) ) * 100.0 ;

fprintf( 'Training Set Accuracy: %f\n' , training_set_accuracy ) ;
fprintf( 'for lambda value of: %f\n' , lambda ) ;
fprintf( 'and period: %f\n' , period ) ;

% write to all_ur_Theta1 & all_ur_Theta2 & cost record
all_ur_Theta1(:,col_count) = Theta1(:) ;
all_ur_Theta2(:,col_count) = Theta2(:) ;
cost_record(col_count,1) = period ;
cost_record(col_count,2) = lambda ;
cost_record(col_count,3) = training_set_accuracy ;
cost_record(col_count,4) = cost(end) ;
col_count = col_count + 1 ;

end % lambda loop

end % period loop

save -binary all_ur_Thetas.mat all_ur_Theta1 all_ur_Theta2 cost_record

With 200 calls to the fmincg.m function this took an overnight run to complete, but in the morning I had extremely good results. For every period there was a trained NN that obtained 100 % accuracy. In fact for most periods there were several values for lambda ( a regularisation term to avoid over-fitting ) that gave 100 % accuracy, in which case I took the NN that had the lowest cost for 100 % accuracy.

So now I have a set of trained NNs, and the next step will be to test them on a cross validation set of my normal "ideal" market types, which will be the subject of my next post.