Dekalog Blog

Tuesday, 31 July 2012

Successful Completion of Neural Net Cross Validation Tests

In my last post I suggested that I was unsure of my coding of the cross validation test I had written so what I have done is take a new coding approach and completely rewritten the test, which I'm happy to say has been very successful. Using this newly coded implementation the out of sample accuracy of the trained neural nets is 100 %. As before, these tests were run overnight, but this time for a total of 2,400,000 separate test examples due to increased code efficiency.

The next test I'm going to code, more out of curiosity than anything else, is a concurrent cross validation test to test both my new neural net classifier algorithm and my Naive Bayesian Classifier together. I expect the NN to again obtain results similar to the above, but anticipate that the Naive Bayesian Classifier will perform quite poorly, achieving between 20 % to 30 % accuracy. I expect such low performance simply because the Naive Bayesian Classifier was developed using just 5 exemplar market type examples compared to 25 for the NN.

Friday, 20 July 2012

Neural Net Cross Validation Tests Completed

These tests were conducted by looping over a series of replicated "idealised" market types; in each iteration cyclic component amplitudes were randomly chosen to range between 1 and 25 and phase shifts were randomly chosen such that the phase shifts that appear in the training set markets do not also appear in these cross validation sets of markets. For each combination of the above one of 25 possible market type changes was also randomly applied and then the relevant feature vector for each iteration was extracted. These tests were run overnight for a total of 1,200,000 separate, iterated test examples. The results are shown below.

Complete Accuracy percentage: 33.574500

"Acceptable" mis-classifications percentages
Predicted = uwr & actual = unr: 5.083417
Predicted = unr & actual = uwr: 7.230083
Predicted = dwr & actual = dnr: 5.170667
Predicted = dnr & actual = dwr: 7.180167
Predicted = uwr & actual = cyc: 3.287917
Predicted = dwr & actual = cyc: 7.180167
Predicted = cyc & actual = uwr: 3.623167
Predicted = cyc & actual = dwr: 3.554333

Dubious, difficult to trade mis-classification percentages
Predicted = uwr & actual = dwr: 2.432667
Predicted = unr & actual = dwr: 2.432667
Predicted = dwr & actual = uwr: 2.351500
Predicted = dnr & actual = uwr: 2.351500

Completely wrong classifications percentages
Predicted = unr & actual = dnr: 0.210083
Predicted = dnr & actual = unr: 0.207333

The complete accuracy percentage requires no comment. The "acceptable" mis-classifications are situations in which the erroneous prediction would not have one trading in a manner that would be inconsistent with the actual state of the market i.e. a predicted uwr and actual cyc is a situation where the market is predicted to be trending upwards with 50% retracements, but in actual fact is trending sideways in a cyclic manner. In either case one might be tempted to trade the swings of the market, so the mis-classification is acceptable because the erroneous prediction would still have you trading in a manner suitable to the "true" situation.

The "Dubious, difficult to trade" mis-classifications are where the above does not apply, i.e. attempting to swing trade in a bullish manner when in fact the market is trending down. One might get lucky and extract some profit, but in all probability the net expectation would be to make a loss. The completely wrong classifications again require no comment. The above totals of percentages do not add up to 100 because some combinations of mis-classifications are not included in this summary.

I'm not overwhelmed by these results, and so I shall continue to extend the features vector with more informative features to hopefully improve future cross validation test results. Also, I'm not 100% sure that my test implementation code is doing what I think it is doing, so that needs checking too.

On a related note, I've just enrolled in another online course, this time devoted entirely to neural nets.

Wednesday, 18 July 2012

Neural Net Training Completed

I am pleased to say that I have now completed the training of my NN market type classifier.

In an earlier post I mentioned that I had constructed a training set of 324,000 training examples to train the NN on. However, my first attempt at using this in its entirety wasn't successful, with an accuracy on the training set of between 52 % to 58 %. What's more, one training "session" lasted approximately 24 hours, with only 50 calls to the fmincg.m function ( a Java implementation is available from here ), and this would need to be repeated many times. This wasn't a practical proposition and I began to think about ways in which I could speed up the training process. One possible solution was to use other software and in my search of the internet I discovered the FANN library and the Fanntool GUI. After a close reading of the manuals I decided that for my purposes this wasn't the route I wanted to take, but in the future I may come back to this, particularly since the library has bindings to Octave.

After some consideration I decided to split the training set into smaller sets, with the intention of training numerous NNs, each trained to classify a market with a given period, and then to index into the relevant NN in a manner similar to that used in my brute force similarity classifier. The code for this training session is shown below.

% first, training data "training_data.mat" should be loaded in command line

clear -exclusive X y accurate_period % clear everything except y and X, previously loaded from the command line

% ************************************************************************
% Comment out the non relevant preprocessing step for the test in question
% ************************************************************************
% use X as it is for X_train
X_train = X ;
% ************************************************************************
% change zeros in X into -1 for X_train
%X_train = X ;
%change = X_train(:,4:end) ;
%change( change == 0 ) = -1 ;
%X_train(:,4:end) = change ;
%*************************************************************************
% train on just one period's features in X
% index into training set based on period measurement

% create final matrices for storing all unrolled Theta1 and Theta2 and cost record
all_ur_Theta1 = zeros(2862,288) ;
all_ur_Theta2 = zeros(270,288) ;
cost_record = zeros(288,4) ;
col_count = 1 ;

for period = 15:50 
[i_X j_X] = find( accurate_period(:,1) == period ) ;
% extract the relevant part of X using above i_X index
X_train = X( [i_X] , 2:54 ) ;
% and same for market labels vector y
y_train = y( [i_X] , 1 ) ;
% ************************************************************************

%% Setup the parameter sizes 
input_layer_size = size(X_train,2) ;   % the number of features ( columns ) in X_train
hidden_layer_size = size(X_train,2) ;  % original was 25 hidden units
num_labels = 5 ;                 % 5 labels, from 1 to 5  
                                 % 1=uwr 2=unr 3=dwr 4=dnr 5=cyc

for lambda = [ 0.01 0.03 0.1 0.3 1 3 10 30 ]

% Initializing Neural Network Parameters
initial_Theta1 = randInitializeWeights( input_layer_size , hidden_layer_size ) ;
initial_Theta2 = randInitializeWeights( hidden_layer_size , num_labels ) ;

% Unroll parameters
initial_nn_params = [ initial_Theta1(:) ; initial_Theta2(:) ] ;

%% =================== Training NN ===================
%  To train the neural network, we will now use "fmincg", which
%  is a function which works similarly to "fminunc". Recall that these
%  advanced optimizers are able to train our cost functions efficiently as
%  long as we provide them with the gradient computations.
%
fprintf( '\nTraining Neural Network... \n' )

%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset( 'MaxIter' , 200 ) ; % original was 50

% try different values of lambda
%lambda = 0.1 ;

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction( p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X_train, y_train, lambda ) ;

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[ nn_params , cost ] = fmincg( costFunction , initial_nn_params , options ) ;

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape( nn_params( 1:hidden_layer_size * (input_layer_size + 1) ) , ...
                 hidden_layer_size , (input_layer_size + 1) ) ;

Theta2 = reshape( nn_params( (1 + (hidden_layer_size * (input_layer_size + 1))):end ) , ...
                 num_labels , (hidden_layer_size + 1) ) ;

%% ================= Implement Predict =================
%  After training the neural network, we would like to use it to predict
%  the labels. You will now implement the "predict" function to use the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict( Theta1 , Theta2 , X_train ) ;
training_set_accuracy = mean( double(pred == y_train) ) * 100.0 ;

fprintf( 'Training Set Accuracy: %f\n' , training_set_accuracy ) ;
fprintf( 'for lambda value of: %f\n' , lambda ) ;
fprintf( 'and period: %f\n' , period ) ;

% write to all_ur_Theta1 & all_ur_Theta2 & cost record
all_ur_Theta1(:,col_count) = Theta1(:) ;
all_ur_Theta2(:,col_count) = Theta2(:) ;
cost_record(col_count,1) = period ;
cost_record(col_count,2) = lambda ;
cost_record(col_count,3) = training_set_accuracy ;
cost_record(col_count,4) = cost(end) ;
col_count = col_count + 1 ;

end % lambda loop

end % period loop

save -binary all_ur_Thetas.mat all_ur_Theta1 all_ur_Theta2 cost_record

With 200 calls to the fmincg.m function this took an overnight run to complete, but in the morning I had extremely good results. For every period there was a trained NN that obtained 100 % accuracy. In fact for most periods there were several values for lambda ( a regularisation term to avoid over-fitting ) that gave 100 % accuracy, in which case I took the NN that had the lowest cost for 100 % accuracy.

So now I have a set of trained NNs, and the next step will be to test them on a cross validation set of my normal "ideal" market types, which will be the subject of my next post.

Monday, 16 July 2012

Jack Schwagger on Youtube

I have just watched a very interesting Youtube video of Jack Schwagger, of Market Wizards fame, giving a presentation. Well worth watching.

Update on Neural Network:- as I write this I have a NN training session running in Octave, which looks very promising. More in a new post in a day or so.

Thursday, 12 July 2012

Brute Force Classifier in Action

As an update to my recent post, here is a short video of the brute force similarity search classifier in action.

Non-embedded view here.
The coloured coded candlestick bars are coloured thus: purple = a cyclic market classification; green = up with retracement; blue = up with no retracement; yellow = down with retracement; red = down with no retracement. The upper price series is the classification as per the brute force algorithm and the lower is the classification as per my Naive Bayesian classifier, shown for comparative purposes. The cyan trend line is my implementation of a Kalman filter, and where this trend line extends out at the hard right hand edge of the chart it changes to become the prediction of the Kalman filter for the next 10 bars, this prediction based on extending the pattern that was selected during the run of the brute force algorithm.

I will leave it up to readers to judge for themselves the efficacy of this new indicator, but I think it shows some promise, and I have some ideas about how it can be improved. This, however, is work for the future. For now I intend to crack on with working on my neural net classification algorithm.

Saturday, 7 July 2012

A Possible Brute Force Similarity Classifier in Octave Code

As part of the development of my neural net classifier it has been necessary to use training data and as usual I have been using my model market types. To increase the amount of such training data I have extended the range of the data to include a change in market type half way through the cycle of one measured cyclic period. I have done this in increments of 1 degree from 1 degree to 360 degrees of a sine wave, for periods 15 to 50, for all possible combinations of market type, for a total database of 324,000 possible market model patterns. However, it struck me after reading this pdf that I could use this database as the basis of what is called in the pdf a "brute force similarity search" classifier. Below is my proof of concept Octave code implementation of such a classifier,

% first, training data "training_data.mat" should be loaded in command line

clear -exclusive X y % clear everything except y and X, previously loaded from the command line

lookup_value = input( 'Enter a number from 1 to 324,000 to choose a lookup candidate row from X: ' ) ;
fprintf( 'Based on this choice the market type to look up is :- ' ) ;
y( lookup_value , 1 )

tic() ;

% index into training set based on period measurement
[i_X j_X] = find( X(:,1) == X( lookup_value , 1 ) ) ;

% keep a record of all i_X indexes
all_i_X = i_X ;

% extract the relevant part of X using above index
X_look_up_matrix = X( [i_X] , 4:54 ) ;

% and same for market labels vector y
y_look_up_vector = y( [i_X] , 1 ) ;

% find pattern in X_look_up_matrix that minimises Euclidean distance between itself and the training example randomly taken from X
[ euc_dist_min i_euc_dist_min ] = min( sum( ( repmat( X(lookup_value,4:54), size(X_look_up_matrix,1), 1) .- X_look_up_matrix ) .^ 2.0 , 2 , 'extra' ) ) ;

fprintf( 'and the algo returns a market type of :- ' ) ;
% take this minimum distance vector index to get predicted market type
y_look_up_vector( i_euc_dist_min , 1 )

fprintf( '\nwith a calculated Euclidean distance of :- ' ) ;
double(euc_dist_min)
fprintf( 'which ideally should be 0.0 on this X test set.\n' ) ;

fprintf( '\nOriginal lookup row check.\n' ) ;
original_i_X_check = all_i_X( i_euc_dist_min , 1 )
fprintf( 'which ideally should be the same as row choice entered.\n' ) ;

fprintf( '\nTime for algo to run.\n' ) ;

toc() ;

where X is the database already mentioned and y is a vector containing the market type labels. Typical terminal output of this code is

octave:1> bf_pattern_recognition
Enter a number from 1 to 324,000 to choose a lookup candidate row from X: 100235
Based on this choice the market type to look up is :- ans = 3
and the algo returns a market type of :- ans = 3

with a calculated Euclidean distance of :- ans = 0
which ideally should be 0.0 on this X test set.

Original lookup row check.
original_i_X_check = 100235
which ideally should be the same as row choice entered.

Time for algo to run.
Elapsed time is 0.1130519 seconds.
octave:2>

Of course it obtains 100 % accuracy on the test set X because the original choice of pattern to be matched comes from X so there is always an exact match to be found. The important thing is that this is a workable algorithm which, making allowances for all the print statements included in the above code, runs in hundredths of a second.

This speed, despite having such a large database to search through, is achieved by indexing into the database by the measured period of the pattern to be matched, which is the first entry on each line. This reduces the search base down to a more manageable 9000 row matrix, and then one line of vectorised code is used to perform the actual Euclidean distance search and classification.

Another possible advantage of this approach on real market data is that, having hopefully accurately classified the data, the matched pattern in the database can be extrapolated under the assumption that the market model will persist for the next 5 to 10 bars, to make a prediction of near future prices. I shall certainly be doing more work will this classifying algorithm!

Saturday, 30 June 2012

Machine Learning Course Completed

I'm pleased to say that I have now completed Andrew Ng's machine learning course, which is offered through Coursera. This post is not intended to be a review of the course, which in my opinion is extremely good and very useful, but more of a reflection of my thoughts and what I think will be useful for me personally.

Firstly, I was pleasantly surprised that the software/programming language of instruction was Octave, which regular readers of this blog will know is my main software of choice. Apart from learning the concepts of ML, I also picked up some handy tips for Octave programming, and more importantly for me I now have a set of working Octave ML functions that I can use immediately in my system development.

In my previous post I mentioned that my first attempt at using ML will be to use a Neural Net to classify market types. As background to this, readers might be interested in a pdf file of the video lectures, available from here, which was put together and posted on the course discussion forum by another student - I think this is very good and all credit to said student, José Soares Augusto.

Due to the honour code ( or honor code for American readers ) of the course I will be unable to post the code that I wrote for the programming assignments. However, I do feel that I can post the code shown in the code box below, as the copyright notice allows it. A few slight changes I made are noted in the copyright notice. This is a minimisation function that was used in the training of the Neural Net assignment and was provided in the assignment download.

function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
% Minimize a continuous differentialble multivariate function. Starting point
% is given by "X" (D by 1), and the function named in the string "f", must
% return a function value and a vector of partial derivatives. The Polack-
% Ribiere flavour of conjugate gradients is used to compute search directions,
% and a line search using quadratic and cubic polynomial approximations and the
% Wolfe-Powell stopping criteria is used together with the slope ratio method
% for guessing initial step sizes. Additionally a bunch of checks are made to
% make sure that exploration is taking place and that extrapolation will not
% be unboundedly large. The "length" gives the length of the run: if it is
% positive, it gives the maximum number of line searches, if negative its
% absolute gives the maximum allowed number of function evaluations. You can
% (optionally) give "length" a second component, which will indicate the
% reduction in function value to be expected in the first line-search (defaults
% to 1.0). The function returns when either its length is up, or if no further
% progress can be made (ie, we are at a minimum, or so close that due to
% numerical problems, we cannot get any closer). If the function terminates
% within a few iterations, it could be an indication that the function value
% and derivatives are not consistent (ie, there may be a bug in the
% implementation of your "f" function). The function returns the found
% solution "X", a vector of function values "fX" indicating the progress made
% and "i" the number of iterations (line searches or function evaluations,
% depending on the sign of "length") used.
%
% Usage: [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
%
% See also: checkgrad 
%
% Copyright (C) 2001 and 2002 by Carl Edward Rasmussen. Date 2002-02-13
%
% (C) Copyright 1999, 2000 & 2001, Carl Edward Rasmussen
% 
% Permission is granted for anyone to copy, use, or modify these
% programs and accompanying documents for purposes of research or
% education, provided this copyright notice is retained, and note is
% made of any changes that have been made.
% 
% These programs and documents are distributed without any warranty,
% express or implied.  As the programs were written for research
% purposes only, they have not been tested to the degree that would be
% advisable in any important application.  All use of these programs is
% entirely at the user's own risk.
%
% [ml-class] Changes Made:
% 1) Function name and argument specifications
% 2) Output display
%
% Dekalog Changes Made:
% Some lines have been altered, changing | to || and & to &&.
% This is to avoid "possible Matlab-style short-circuit operator" warnings
% being given when code is run under Octave. The lines where these changes
% have been made are indicated by comments at the end of each respective line. 

% Read options
if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter')
    length = options.MaxIter;
else
    length = 100;
end


RHO = 0.01;                            % a bunch of constants for line searches
SIG = 0.5;       % RHO and SIG are the constants in the Wolfe-Powell conditions
INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                    % extrapolate maximum 3 times the current bracket
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 100;                                      % maximum allowed slope ratio

argstr = ['feval(f, X'];                      % compose string used to call function
for i = 1:(nargin - 3)
  argstr = [argstr, ',P', int2str(i)];
end
argstr = [argstr, ')'];

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
S=['Iteration '];

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
fX = [];
[f1 df1] = eval(argstr);                      % get function value and gradient
i = i + (length<0);                                            % count epochs?!
s = -df1;                                        % search direction is steepest
d1 = -s'*s;                                                 % this is the slope
z1 = red/(1-d1);                                  % initial step is red/(|s|+1)

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; f0 = f1; df0 = df1;                   % make a copy of current values
  X = X + z1*s;                                             % begin line search
  [f2 df2] = eval(argstr);
  i = i + (length<0);                                          % count epochs?!
  d2 = df2'*s;
  f3 = f1; d3 = d1; z3 = -z1;             % initialize point 3 equal to point 1
  if length>0, M = MAX; else M = min(MAX, -length-i); end
  success = 0; limit = -1;                     % initialize quanteties
  while 1
    while ((f2 > f1+z1*RHO*d1) || (d2 > -SIG*d1)) && (M > 0) % | and & changed to || and && to avoid "possible Matlab-style short-circuit operator" warning 
      limit = z1;                                         % tighten the bracket
      if f2 > f1
        z2 = z3 - (0.5*d3*z3*z3)/(d3*z3+f2-f3);                 % quadratic fit
      else
        A = 6*(f2-f3)/z3+3*(d2+d3);                                 % cubic fit
        B = 3*(f3-f2)-z3*(d3+2*d2);
        z2 = (sqrt(B*B-A*d2*z3*z3)-B)/A;       % numerical error possible - ok!
      end
      if isnan(z2) || isinf(z2)     % | changed to || to avoid "possible Matlab-style short-circuit operator" warning 
        z2 = z3/2;                  % if we had a numerical problem then bisect
      end
      z2 = max(min(z2, INT*z3),(1-INT)*z3);  % don't accept too close to limits
      z1 = z1 + z2;                                           % update the step
      X = X + z2*s;
      [f2 df2] = eval(argstr);
      M = M - 1; i = i + (length<0);                           % count epochs?!
      d2 = df2'*s;
      z3 = z3-z2;                    % z3 is now relative to the location of z2
    end
    if f2 > f1+z1*RHO*d1 || d2 > -SIG*d1                    % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      break;                                                % this is a failure
    elseif d2 > SIG*d1
      success = 1; break;                                             % success
    elseif M == 0
      break;                                                          % failure
    end
    A = 6*(f2-f3)/z3+3*(d2+d3);                      % make cubic extrapolation
    B = 3*(f3-f2)-z3*(d3+2*d2);
    z2 = -d2*z3*z3/(B+sqrt(B*B-A*d2*z3*z3));        % num. error possible - ok!
    if ~isreal(z2) || isnan(z2) || isinf(z2) || z2 < 0   % num prob or wrong sign? % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      if limit < -0.5                               % if we have no upper limit
        z2 = z1 * (EXT-1);                 % the extrapolate the maximum amount
      else
        z2 = (limit-z1)/2;                                   % otherwise bisect
      end
    elseif (limit > -0.5) && (z2+z1 > limit)       % extraplation beyond max?   % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = (limit-z1)/2;                                               % bisect
    elseif (limit < -0.5) && (z2+z1 > z1*EXT)      % extrapolation beyond limit % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = z1*(EXT-1.0);                           % set to extrapolation limit
    elseif z2 < -z3*INT
      z2 = -z3*INT;
    elseif (limit > -0.5) && (z2 < (limit-z1)*(1.0-INT))   % too close to limit? % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = (limit-z1)*(1.0-INT);
    end
    f3 = f2; d3 = d2; z3 = -z2;                  % set point 3 equal to point 2
    z1 = z1 + z2; X = X + z2*s;                      % update current estimates
    [f2 df2] = eval(argstr);
    M = M - 1; i = i + (length<0);                             % count epochs?!
    d2 = df2'*s;
  end                                                      % end of line search

  if success                                         % if line search succeeded
    f1 = f2; fX = [fX' f1]';
    fprintf('%s %4i | Cost: %4.6e\r', S, i, f1);
    s = (df2'*df2-df1'*df2)/(df1'*df1)*s - df2;      % Polack-Ribiere direction
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    d2 = df1'*s;
    if d2 > 0                                      % new slope must be negative
      s = -df1;                              % otherwise use steepest direction
      d2 = -s'*s;    
    end
    z1 = z1 * min(RATIO, d1/(d2-realmin));          % slope ratio but max RATIO
    d1 = d2;
    ls_failed = 0;                              % this line search did not fail
  else
    X = X0; f1 = f0; df1 = df0;  % restore point from before failed line search
    if ls_failed || i > abs(length)         % line search failed twice in a row % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      break;                             % or we ran out of time, so we give up
    end
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    s = -df1;                                                    % try steepest
    d1 = -s'*s;
    z1 = 1/(1-d1);                     
    ls_failed = 1;                                    % this line search failed
  end
  if exist('OCTAVE_VERSION')
    fflush(stdout);
  end
end
fprintf('\n');

Finally, the last set of videos talked about "Artificial Data Synthesis," otherwise known as creating your own data for training purposes. This is basically what I had planned to do anyway ( see previous post ), but it is nice to learn that it is standard, accepted practice in the ML world. The first such way of creating data, in the context of Photo OCR, is shown below

where various font libraries are used against random backgrounds. I think this very much mirrors my planned approach of training on repeated sets of my "ideal time series" construct. However, another approach which could be used is "data distortion," shown in this next image

which is an approach that my creating synthetic data using FFT might be useful for, or alternatively a correlation and cointegration approach as shown in R code in this Quantitative Finance thread.

All in all, I'm quite excited by the possibilities of my new found knowledge, and I fully expect that in time, after development and testing, any Neural Net I develop will in fact replace my current Naive Bayesian classifier.

Pages