Monday, 15 July 2013

My NN Input Tweak

As I said in my previous post, I wanted to come up with a "principled" method of transforming market data into inputs more suitable for my classification NN, and this post was going to be about how I have investigated various bandpass filters, Fast Fourier transform filters set to eliminate cycles less than the dominant cycle in the data, FIR filters, and a form of the Hilbert-Huang transform. However, while looking to link to another site I happened upon a so-called "Super smoother" filter, but more on this later.

Firstly, I would like to explain how I added realistic "noise" to my normal, idealised market data. Some time ago I looked at, and then abandoned, the idea of using an AFIRMA smoothing algorithm, an example of which is shown below:
The cyan line is the actual data to be smoothed and the red line is an 11 bar, Blackman window function AFIRMA smooth of this data. If one imagines this red smooth to be one realisation of my idealised market data, i.e. the real underlying signal, then the blue data is the signal plus noise. I have taken the measure of the amount of noise present to be the difference between the red and blue lines normalised by the standard deviation of Bollinger bands applied to the red AFIRMA smooth. I did this over all the historical data I have and combined it all into one noise distribution file. When creating my idealised data it is then a simple matter to apply a Bollinger band to it and add un-normalised random samples from this file as noise to create a realistically noisy, idealised time series for testing purposes.

Now for the "super smoother" filter. Details are freely available in the white paper available from here and my Octave C++ .oct file implementation is given in the code box below.
#include octave/oct.h
#include octave/dcolvector.h
#include math.h
#define PI 3.14159265

DEFUN_DLD ( super_smoother, args, nargout,
  "-*- texinfo -*-\n\
     @deftypefn {Function File} {} super_smoother (@var{n})\n\
     This function smooths the input column vector by using Elher's\n\
     super smoother algorithm set for a 10 bar critical period.\n\
     @end deftypefn" )
  octave_value_list retval_list ;
  int nargin = args.length () ;

// check the input arguments
    if ( nargin != 1 )
        error ("Invalid argument. Argument is a column vector of price") ;
        return retval_list ;

    if (args(0).length () < 7 )
        error ("Invalid argument. Argument is a column vector of price") ;
        return retval_list ;

    if (error_state)
        error ("Invalid argument. Argument is a column vector of price") ;
        return retval_list ;
// end of input checking

  ColumnVector input = args(0).column_vector_value () ;
  ColumnVector output = args(0).column_vector_value () ;
  // Declare variables
  double a1 = exp( -1.414 * PI / 10.0 ) ;
  double b1 = 2.0 * a1 * cos( (1.414*180.0/10.0) * PI / 180.0 ) ;
  double c2 = b1 ;
  double c3 = -a1 * a1 ;
  double c1 = 1.0 - c2 - c3 ;
  // main loop
  for ( octave_idx_type ii (2) ; ii < args(0).length () ; ii++ )
      output(ii) = c1 * ( input(ii) + input(ii-1) ) / 2.0 + c2 * output(ii-1) + c3 * output(ii-2) ; 
      } // end of for loop
  retval_list(0) = output ;

  return retval_list ;
} // end of function
I like this filter because it smooths away all cycles below the 10 bar critical period, and long time readers of this blog may recall from this post that there are no dominant cycle periods in my EOD data that are this short. The filter produces very nice, smooth curves which act as a good model for the idealised market, smooth curves my NN system was trained on. A typical result on synthetic data is shown below
where the red line is a typical example of "true underlying" NN training data, the cyan is the red plus realistic noise as described above, and the yellow is the final "super smoother" filter of the cyan "price". There is a bit of lag in this filter but I'm not unduly concerned with this as it is intended as input to the classification NN, not the market timing NN. Below is a short film of the performance of this filter on the last two years or so of the ES mini S&P contract. The cyan line is the raw input and the red line is the super smoother output.
Non-embedded view here.