The inputs are channel normalised prices, with the length of the channel being adaptive to the dominant cycle period. This function is called as part of a rolling neural net training regime to select the top n (n = 100 in this case) matches in the historical record as training data. The actual NN training code is a close adaptation of the code in my neural net walkforward training post, but with a couple of important caveats which are discussed below.
function [ top_matches ] = rolling_cauchy_schwarz_matching_algo_2( open_ch, high_ch, low_ch, close_ch, period ) % pre-allocate vectors in memory cauchy_schwarz_values = zeros( size(close_ch,1) , 1 ) ; top_matches = zeros( size(close_ch,1), 100 ) ; % select price bar to train nn on for jj = size(close_ch,1)-250 : size(close_ch,1) lookback = period( jj ) ; sample_to_match = [ close_ch( jj-lookback : jj )' high_ch( jj-9 : jj )' low_ch( jj-9 : jj )' ( close_ch( jj-4 : jj ).-open_ch( jj-4 : jj ) )' ] ; norm_sample_to_match = norm( sample_to_match ) ; % for this jj train_bar, calculate cauchy_schwarz matching values in the historical record up to index jj-2 for ii = 50 : jj - 2 cauchy_schwarz_values(ii) = abs( sample_to_match * [ close_ch( ii-lookback : ii ) ; high_ch( ii-9 : ii ) ; low_ch( ii-9 : ii ) ; ( close_ch( ii-4 : ii ).-open_ch( ii-4 : ii ) ) ] ) / ( norm_sample_to_match * norm( [ close_ch( ii-lookback : ii , 1 ) ; high_ch( ii-9 : ii ) ; low_ch( ii-9 : ii ) ; ( close_ch( ii-4 : ii ).-open_ch( ii-4 : ii ) ) ] ) ) ; end % end of ii loop % get the top 100 matches for this price bar [ s, sort_index ] = sort( cauchy_schwarz_values ) ; top_matches( jj, : ) = sort_index( end-99 : end )' ; end % end of jj loop end % end of function
Firstly, when training a feedforward neural network it is normal that a certain number of samples are held out of the training set for use as a cross validation set. The point of this is to ensure that the trained NN will generalise well to as yet unseen data. In the case of my rolling training regime this does not apply. The NN that is being trained for the "current bar" will be used once to classify the "current bar" and then thrown away. The "next bar" will have a completely new NN trained specifically for it, which in its turn will be discarded, and so on and so on along the whole price history. There is no need to ensure generalisation of any specifically trained NN. This being the case, all the training set examples are used in the training and early stopping is implemented by a crude heuristic of classification accuracy on the training set: training stops when the classification error rate on the whole training set is <= 5%. Further experience with this in the future may lead me to make some adjustments, but for now this is what I am going with.
A second reason for adopting this approach stems from my reading of this book wherein it is stated that on financial time series the "traditional" machine learning error metrics can be misleading. It cites a (theoretical?) example of a profitable trading system that has been trained/optimised for maximum profit but has a counter-intuitive, negative R-squared. The explanation for this lies in the heavy tails of price distribution(s). It is in these tails that the extreme returns reside and where the big profits/losses are to be made. However, by using a more traditional error metric such as least squares a ML algorithm might concentrate on the central area of a price distribution in order to reduce the error metric on the majority of price instances and thereby ignore the tails, producing a nice, low error but a useless system. The converse can be true for a good system, in that the ML least squares metric can be rubbish but the relevant performance metric (max profit, min draw down, risk adjusted return etc.) of the system great.
It is for these reasons that I have adopted my current approach.