% first, training data "training_data.mat" should be loaded in command line
clear -exclusive X y % clear everything except y and X, previously loaded from the command line
lookup_value = input( 'Enter a number from 1 to 324,000 to choose a lookup candidate row from X: ' ) ;
fprintf( 'Based on this choice the market type to look up is :- ' ) ;
y( lookup_value , 1 )
tic() ;
% index into training set based on period measurement
[i_X j_X] = find( X(:,1) == X( lookup_value , 1 ) ) ;
% keep a record of all i_X indexes
all_i_X = i_X ;
% extract the relevant part of X using above index
X_look_up_matrix = X( [i_X] , 4:54 ) ;
% and same for market labels vector y
y_look_up_vector = y( [i_X] , 1 ) ;
% find pattern in X_look_up_matrix that minimises Euclidean distance between itself and the training example randomly taken from X
[ euc_dist_min i_euc_dist_min ] = min( sum( ( repmat( X(lookup_value,4:54), size(X_look_up_matrix,1), 1) .- X_look_up_matrix ) .^ 2.0 , 2 , 'extra' ) ) ;
fprintf( 'and the algo returns a market type of :- ' ) ;
% take this minimum distance vector index to get predicted market type
y_look_up_vector( i_euc_dist_min , 1 )
fprintf( '\nwith a calculated Euclidean distance of :- ' ) ;
double(euc_dist_min)
fprintf( 'which ideally should be 0.0 on this X test set.\n' ) ;
fprintf( '\nOriginal lookup row check.\n' ) ;
original_i_X_check = all_i_X( i_euc_dist_min , 1 )
fprintf( 'which ideally should be the same as row choice entered.\n' ) ;
fprintf( '\nTime for algo to run.\n' ) ;
toc() ;
where X is the database already mentioned and y is a vector containing the market type labels. Typical terminal output of this code isoctave:1> bf_pattern_recognition
Enter a number from 1 to 324,000 to choose a lookup candidate row from X: 100235
Based on this choice the market type to look up is :- ans = 3
and the algo returns a market type of :- ans = 3
with a calculated Euclidean distance of :- ans = 0
which ideally should be 0.0 on this X test set.
Original lookup row check.
original_i_X_check = 100235
which ideally should be the same as row choice entered.
Time for algo to run.
Elapsed time is 0.1130519 seconds.
octave:2>
Of course it obtains 100 % accuracy on the test set X because the original choice of pattern to be matched comes from X so there is always an exact match to be found. The important thing is that this is a workable algorithm which, making allowances for all the print statements included in the above code, runs in hundredths of a second.
This speed, despite having such a large database to search through, is achieved by indexing into the database by the measured period of the pattern to be matched, which is the first entry on each line. This reduces the search base down to a more manageable 9000 row matrix, and then one line of vectorised code is used to perform the actual Euclidean distance search and classification.
Another possible advantage of this approach on real market data is that, having hopefully accurately classified the data, the matched pattern in the database can be extrapolated under the assumption that the market model will persist for the next 5 to 10 bars, to make a prediction of near future prices. I shall certainly be doing more work will this classifying algorithm!
No comments:
Post a Comment