Dekalog Blog: Savitzky-Golay

Showing posts with label Savitzky-Golay. Show all posts

Monday, 30 December 2024

A "New" Way to Smooth Price

Below is code for an Octave compiled C++ .oct function to smooth price data.

#include "octave oct.h"
#include "octave dcolvector.h"
#include "octave dmatrix.h"
#include "GenFact.h"
#include "GramPoly.h"
#include "Weight.h"

DEFUN_DLD ( double_smooth_proj_2_5, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} double_smooth_proj_2_5 (@var{input_vector})\n\
This function takes an input series and smooths it by projecting a 5 bar rolling linear fit\n\
3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input\n\
to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e. a 0.5 bar\n\
lead over the last actual rolling point in the series. This is averaged with the previously\n\
calculated such smoothed point for a theoretical zero-lagged smooth. This smooth is\n\
again smoothed as above for a smooth of the smooth, i.e. a double-smooth of the\n\
original input series. The double_smooth and smooth are the function outputs.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin > 1 ) // there must be a single, input price vector
   {
   error ("Invalid arguments. Inputs are a single, input price vector.") ;
   return retval_list ;
   }

if ( args(0).length () < 5 )
   {
   error ("Invalid 1st argument length. Input is a price vector of length >= 5.") ;
   return retval_list ;
   }
// end of input checking

ColumnVector input = args(0).column_vector_value () ;
ColumnVector smooth = args(0).column_vector_value () ;
ColumnVector double_smooth = args(0).column_vector_value () ;

// create the fit coefficients matrix
int p = 5 ;             // the number of points in calculations
int m = ( p - 1 ) / 2 ; // value to be passed to call_GenPoly_routine_from_octfile
int n = 1 ;             // value to be passed to call_GenPoly_routine_from_octfile
int s = 0 ;             // value to be passed to call_GenPoly_routine_from_octfile

  // create matrix for fit coefficients
  Matrix fit_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        fit_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  // create matrix for slope coefficients
  Matrix slope_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  s = 1 ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        slope_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  Matrix smooth_coefficients ( 1 , 5 ) ;
  // fill the smooth_coefficients matrix, smooth_coeffs = ( 9/24 ) .* fit_coeffs + ( 7/12 ) .* slope_coeffs + [ 0 ; 1/24 ; 1/8 ; 5/24 ; 1/4 ] ;
  smooth_coefficients( 0 , 0 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 0 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 0 , 4 ) ;
  smooth_coefficients( 0 , 1 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 1 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 1 , 4 ) + ( 1.0 / 24.0 ) ;
  smooth_coefficients( 0 , 2 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 2 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 2 , 4 ) + ( 1.0 / 8.0 ) ;
  smooth_coefficients( 0 , 3 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 3 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 3 , 4 ) + ( 5.0 / 24.0 ) ;
  smooth_coefficients( 0 , 4 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 4 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 4 , 4 ) + ( 1.0 / 4.0 ) ;

    for ( octave_idx_type ii (4) ; ii < args(0).length () ; ii++ )
        {

         smooth( ii ) = smooth_coefficients( 0 , 0 ) * input( ii - 4 ) + smooth_coefficients( 0 , 1 ) * input( ii - 3 ) + smooth_coefficients( 0 , 2 ) * input( ii - 2 ) +
                        smooth_coefficients( 0 , 3 ) * input( ii - 1 ) + smooth_coefficients( 0 , 4 ) * input( ii ) ;

         double_smooth( ii ) = smooth_coefficients( 0 , 0 ) * smooth( ii - 4 ) + smooth_coefficients( 0 , 1 ) * smooth( ii - 3 ) + smooth_coefficients( 0 , 2 ) * smooth( ii - 2 ) +
                               smooth_coefficients( 0 , 3 ) * smooth( ii - 1 ) + smooth_coefficients( 0 , 4 ) * smooth( ii ) ;

        }

retval_list( 0 ) = double_smooth ;
retval_list( 1 ) = smooth ;

return retval_list ;

} // end of function

Rather than describe it, I'll just paste the "help" description below:-

"This function takes an input series and smooths it by projecting a 5 bar rolling linear fit 3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e. a 0.5 bar lead over the last actual rolling point in the series. This is averaged with the previously calculated such smoothed point for a theoretical zero-lagged smooth. This smooth is again smoothed as above for a smooth of the smooth, i.e. a double-smooth of the original input series. The double_smooth and smooth are the function outputs."

The above function calls previous code of mine to calculate savitzky golay filter convolution coefficients for internal calculations, but for the benefit of readers I will simply post the final coefficients for a 5 tap FIR filter.

-0.191667  -0.016667   0.200000   0.416667   0.591667

These coefficients are for a "single" smooth. Run the output from a function using these through the same function to get the "double" smooth.

Testing on a sinewave looks like this:-

where the cyan, green and blue filters are the original FIR filter with 2.5 bar lag, a 5 bar SMA and Ehler's super smooth (see code here) applied to sinewave "price" for comparative purposes. The red filter is my "double_smooth" and the magenta is a Jurik Moving Average using an Octave adaptation of code that is/was freely available on the Tradingview website. The parameters for this Jurik average (length, phase and power) were chosen to match, as closely as possible, those of the double_smooth for an apples to apples comparison.

I will not discuss this any further in this post other than to say I intend to combine this with the ideas contained in my new use for kalman filter post.

Monday, 1 April 2019

Test of Constant Velocity Model Kalman Filter

Following on from my previous post, this post is a more detailed description of the testing methodology to test kinematic motion models on financial time series. The rationale behind the test(s) which are described below is different from the usual backtesting in that the test(s) are to determine whether the Kalman filter model is mismatched or not, i.e. whether the model innovations match the assumption that the residuals are Gaussian.

If readers have read the estimation lecture notes linked in my earlier post you'll know that one problem with estimating the above is that it is often difficult or impossible to know the covariance matrices for the process noise and the measurement noise, and getting these wrong effects the performance of the Kalman filter even if the underlying model is accurately specified. My "solution" to this is to use the Savitzky-Golay smoothing filter to smooth the time series and at the same time to get estimates for the velocity, acceleration etc. In real time of course this cannot be done because the Savitzky-Golay filter is a "centred" smoother and thus requires knowledge of data in the future. Using this information a crude estimate of the covariance matrices is thus derived, as shown in Octave code in the code box below:

% Estimate measurements for matrix R and covariance matrix Q by using the best
% fit quadratic using above sgolay filter coeffs
R = zeros( 2 , 2 ) ; Q = R ; % position and velocity model

% variance of position measurement
R( 1 , 1 ) = var( measured_pos( 1 , 7 : end - 6 ) .- real_pos( 1 , 7 : end - 6 ) ) ;

% variance of velocity measurement
R( 2 , 2 ) = var( measured_vel( 1 , 7 : end - 6 ) .- real_vel( 1 , 7 : end - 6 ) ) ;

% covariance of position process noise
Q( 1 , 1 ) = cov( ( real_pos( 1 , 7 : end - 7 ) .+ real_vel( 1 , 7 : end - 7 ) ) .- real_pos( 1 , 8 : end - 6 ) ) ;

% covariance of velocity process noise
Q( 2 , 2 ) = cov( diff( real_vel( 1 , 7 : end - 6 ) ) ) ;

% covariance of position process noise and velocity process noise
Q( 1 , 2 ) = cov( ( real_pos( 1 , 7 : end - 7 ) .+ real_vel( 1 , 7 : end - 7 ) ) .- real_pos( 1 , 8 : end - 6 ) , ...
diff( real_vel( 1 , 7 : end - 6 ) ) ) ;
Q( 2 , 1 ) = Q( 1 , 2 ) ;

where real_pos and real_vel are calculated using the Savitzky-Golay filter and represent the "true" position and velocity of a Constant Velocity model compared to the measured_pos and measured_vel measurements that are taken in real time. These crude estimates of the covariance matrices are used as inputs to the Autocovariance Least Squares Method implemented by means of the Octave ALS package to get a hopefully much more accurate estimate of the noise covariances. Armed with these estimates, the test code as outlined in the above mentioned estimation lecture notes paper and my previous post can then be run.

To avoid having to manually judge success or failure by looking at numerous plots such as this one

I run a few simple statistical tests

% Performance Tests for an Unmatched Kalman Filter

% Test 1 ---> Innovation Magnitude Bound Test
% check the innovation is bounded by +/- 2 * sqrt( S )
test_1 = sum( ( nu( 1 , 7 : end - 3 ) < 2 * s( 1 , 7 : end - 3 ) ) .* ...
( nu( 1 , 7 : end - 3 ) > -2 * s( 1 , 7 : end - 3 ) ) ) / length( s( 1 , 7 : end - 3 ) ) ;
% greater than 0.95 is a pass, i.e. less than 5 % of values out of bounds
% record result 
all_test_results( jj , 1 ) = test_1 ; 

%  Perform a T-test of the null hypothesis 'mean (X) == M' for a
%  sample X from a normal distribution with unknown mean and unknown
%  std deviation.  Under the null, the test statistic T has a
%  Student's t distribution.  The default value of M is 0.
%  If H is 0 the null hypothesis is accepted, if it is 1 the null
%  hypothesis is rejected.
zero_mean_innovation_pos = ttest( nu( 1 , : ) , 0 ) ; % innovations are centred around zero?
% H = 0 is a pass, i.e. centred around zero
% record result 
all_test_results( jj , 2 ) = zero_mean_innovation_pos ; 

% Test 2 ---> Normalised Innovation Squared Test
%  > qchisq( c( 0.05 , 0.95 ) , df = 1 , lower.tail = FALSE )
%  [1] 3.84145882 0.00393214
%  > qchisq( c( 0.05 , 0.95 ) , df = 2 , lower.tail = FALSE )
%  [1] 5.9914645 0.1025866

d_of_freedom = sum( sum( H ) ) ;
switch ( d_of_freedom )
  case 1
  lower_CI = 0.00393214 ;
  upper_CI = 3.84145882 ;
case 2
  lower_CI = 0.1025866 ;
  upper_CI = 5.9914645 ;  
endswitch

normalised_innovation_squared_chi2_test_pos = ( sum_Q_pos > lower_CI ) * ( sum_Q_pos < upper_CI ) ;
% value of 1 is a pass
% record result 
all_test_results( jj , 3 ) = normalised_innovation_squared_chi2_test_pos ;  

% Test 3 ---> Innovation Whiteness (Autocorrelation) Test
% check the innovation is bounded by +/- 2 * std
%level = 2 * std( r_pos( N : 2 * N - 1 ) / r_pos( N ) ) ;
level = 1.96 / sqrt( N - burn_in - 2 ) ;
##test_3 = sum( ( r_pos( N : 2 * N - 1 ) / r_pos( N ) < level ) .* ( r_pos( N : 2 * N - 1 ) / r_pos( N ) > -level ) ) / ...
##length( r_pos( N : 2 * N - 1 ) / r_pos( N ) ) ;

test_3 = sum( ( r_pos( no_lags + 1 : end ) < level ) .* ( r_pos( no_lags + 1 : end ) > -level ) ) ...
/ length( r_pos( no_lags + 1 : end) ) ;

% greater than 0.95 is a pass, i.e. less than 5 % of values out of bounds
% record result 
all_test_results( jj , 4 ) = test_3 ; 

zero_mean_autocorrelation_plot_pos = ttest( r_pos( no_lags + 1 : end ) , 0 ) ; % centred around zero
% H = 0 is a pass, i.e. centred around zero
% record result 
all_test_results( jj , 5 ) = zero_mean_autocorrelation_plot_pos ;

across a range of forex crosses and my currency indices and see if, say 95%, of them pass these tests.

As a result of my first tests thus outlined on the Constant Velocity Kinematic Model I think can confidently say that this model is "mismatched," despite it sometimes being described as the simplest "useful" model, and therefore readers should not use this model in a Kalman filter on financial time series. It is worth noting that this model is named Model One in the Trend Without Hiccups paper, which in turn is very closely related to a local linear trend model, which the authors name Model Two. Given that the underlying model(s) for both are mismatched it is not surprising that the authors report disappointing results from these models.

In my next post I shall report results on the Constant Acceleration Model.

Thursday, 20 April 2017

Using the BayesOpt Library to Optimise my Planned Neural Net

Following on from my last post, I have recently been using the BayesOpt library to optimise my planned neural net, and this post is a brief outline, with code, of what I have been doing.

My intent was to design a Nonlinear autoregressive exogenous model using my currency strength indicator as the main exogenous input, along with other features derived from the use of Savitzky-Golay filter convolution to model velocity, acceleration etc. I decided that rather than model prices directly, I would model the 20 period simple moving average because it would seem reasonable to assume that modelling a smooth function would be easier, and from this average it is a trivial matter to reverse engineer to get to the underlying price.

Given that my projected feature space/lookback length/number of nodes combination is/was a triple digit, discrete dimensional problem, I used the "bayesoptdisc" function from the BayesOpt library to perform a discrete Bayesian optimisation over these parameters, the main Octave script for this being shown below.

clear all ;

% load the data
% load eurusd_daily_bin_bars ;
% load gbpusd_daily_bin_bars ;
% load usdchf_daily_bin_bars ;
load usdjpy_daily_bin_bars ;
load all_rel_strengths_non_smooth ;
% all_rel_strengths_non_smooth = [ usd_rel_strength_non_smooth eur_rel_strength_non_smooth gbp_rel_strength_non_smooth chf_rel_strength_non_smooth ...
% jpy_rel_strength_non_smooth aud_rel_strength_non_smooth cad_rel_strength_non_smooth ] ;

% extract relevant data
% price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice 
% price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; % midprice  
price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; % midprice
base_strength = all_rel_strengths_non_smooth( : , 1 ) .- 0.5 ;
term_strength = all_rel_strengths_non_smooth( : , 5 ) .- 0.5 ; 

% clear unwanted data
% clear eurusd_daily_bars all_rel_strengths_non_smooth ;
% clear gbpusd_daily_bars all_rel_strengths_non_smooth ;
% clear usdchf_daily_bars all_rel_strengths_non_smooth ;
clear usdjpy_daily_bars all_rel_strengths_non_smooth ;

global start_opt_line_no = 200 ;
global stop_opt_line_no = 7545 ;

% get matrix coeffs
slope_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 1 ) ;
accel_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 2 ) ;
jerk_coeffs = generalised_sgolay_filter_coeffs( 5 , 3 , 3 ) ;

% create features
sma20 = sma( price , 20 ) ;
global targets = sma20 ;
[ sma_max , sma_min ] = adjustable_lookback_max_min( sma20 , 20 ) ;
global sma20r = zeros( size(sma20,1) , 5 ) ;
global sma20slope = zeros( size(sma20,1) , 5 ) ;
global sma20accel = zeros( size(sma20,1) , 5 ) ;
global sma20jerk = zeros( size(sma20,1) , 5 ) ;

global sma20diffs = zeros( size(sma20,1) , 5 ) ;
global sma20diffslope = zeros( size(sma20,1) , 5 ) ;
global sma20diffaccel = zeros( size(sma20,1) , 5 ) ;
global sma20diffjerk = zeros( size(sma20,1) , 5 ) ;

global base_strength_f = zeros( size(sma20,1) , 5 ) ;
global term_strength_f = zeros( size(sma20,1) , 5 ) ;

base_term_osc = base_strength .- term_strength ;
global base_term_osc_f = zeros( size(sma20,1) , 5 ) ;
slope_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
accel_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
jerk_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_bt_osc_f = zeros( size(sma20,1) , 5 ) ;

slope_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_base_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_base_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_base_strength_f = zeros( size(sma20,1) , 5 ) ;

slope_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_term_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_term_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_term_strength_f = zeros( size(sma20,1) , 5 ) ;

min_max_range = sma_max .- sma_min ;

for ii = 51 : size( sma20 , 1 ) - 1 % one step ahead is target
  
  targets(ii) = 2 * ( ( sma20(ii+1) - sma_min(ii) ) / min_max_range(ii) - 0.5 ) ;
  
  % scaled sma20
  sma20r(ii,:) = 2 .* ( ( flipud( sma20(ii-4:ii,1) )' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ;
  sma20slope(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * slope_coeffs ) ;
  sma20accel(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * accel_coeffs ) ;
  sma20jerk(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * jerk_coeffs ) ;
  
  % scaled diffs of sma20
  sma20diffs(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' ) ;
  sma20diffslope(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * slope_coeffs ) ;
  sma20diffaccel(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * accel_coeffs ) ;
  sma20diffjerk(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * jerk_coeffs ) ;
  
  % base strength
  base_strength_f(ii,:) = fliplr( base_strength(ii-4:ii)' ) ;
  slope_base_strength_f(ii,:) = fliplr( slope_base_strength(ii-4:ii)' ) ;
  accel_base_strength_f(ii,:) = fliplr( accel_base_strength(ii-4:ii)' ) ;
  jerk_base_strength_f(ii,:) = fliplr( jerk_base_strength(ii-4:ii)' ) ;
  
  % term strength
  term_strength_f(ii,:) = fliplr( term_strength(ii-4:ii)' ) ;
  slope_term_strength_f(ii,:) = fliplr( slope_term_strength(ii-4:ii)' ) ;
  accel_term_strength_f(ii,:) = fliplr( accel_term_strength(ii-4:ii)' ) ;
  jerk_term_strength_f(ii,:) = fliplr( jerk_term_strength(ii-4:ii)' ) ;
  
  % base term oscillator
  base_term_osc_f(ii,:) = fliplr( base_term_osc(ii-4:ii)' ) ;
  slope_bt_osc_f(ii,:) = fliplr( slope_bt_osc(ii-4:ii)' ) ;
  accel_bt_osc_f(ii,:) = fliplr( accel_bt_osc(ii-4:ii)' ) ;
  jerk_bt_osc_f(ii,:) = fliplr( jerk_bt_osc(ii-4:ii)' ) ;
  
endfor

% create xset for bayes routine
% raw indicator
xset = zeros( 4 , 5 ) ; xset( 1 , : ) = 1 : 5 ;
% add the slopes
to_add = zeros( 4 , 15 ) ; 
to_add( 1 , : ) = [ 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ; 
xset = [ xset to_add ] ; 
% add accels
to_add = zeros( 4 , 21 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 1 2 2 3 3 3 1 2 3 4 2 3 3 4 4 4 4 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 1 1 2 1 2 3 1 1 1 1 2 2 3 1 2 3 4 ] ;
xset = [ xset to_add ] ; 
% add jerks
to_add = zeros( 4 , 70 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 2 1 2 2 2 3 3 3 3 3 3 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 2 1 1 2 2 1 2 2 3 3 3 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 4 , : ) = [ 1 1 1 1 2 1 1 1 2 1 1 2 1 2 3 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
% construct all_xset for combinations of indicators and look back lengths
all_zeros = zeros( size( xset ) ) ;
all_xset = [ xset ; repmat( all_zeros , 3 , 1 ) ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; xset ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; xset ] ] ;
all_xset = [ all_xset repmat( xset , 4 , 1 ) ] ;

ones_all_xset = ones( 1 , size( all_xset , 2 ) ) ;

% now add layer for number of neurons and extend as necessary
max_number_of_neurons_in_layer = 20 ;

parameter_matrix = [] ;

for ii = 2 : max_number_of_neurons_in_layer % min no. of neurons is 2, max = max_number_of_neurons_in_layer  
  parameter_matrix = [ parameter_matrix [ ii .* ones_all_xset ; all_xset ] ] ;
endfor

% now the actual bayes optimisation routine
% set the parameters
params.n_iterations = 190; % bayesopt library default is 190
params.n_init_samples = 10;
params.crit_name = 'cEIa'; % cEI is default. cEIa is an annealed version
params.surr_name = 'sStudentTProcessNIG';
params.noise = 1e-6;
params.kernel_name = 'kMaternARD5';
params.kernel_hp_mean = [1];
params.kernel_hp_std = [10];
params.verbose_level = 1; % 3 to path below
params.log_filename = '/home/dekalog/Documents/octave/cplusplus.oct_functions/nn_functions/optimise_narx_ind_lookback_nodes_log';
params.l_type = 'L_MCMC' ; % L_EMPIRICAL is default
params.epsilon = 0.5 ; % probability of performing a random (blind) evaluation of the target function. 
% Higher values implies forced exploration while lower values relies more on the exploration/exploitation policy of the criterion. 0 is default

% the function to optimise
fun = 'optimise_narx_ind_lookback_nodes_rolling' ;

% the call to the Bayesopt library function
bayesoptdisc( fun , parameter_matrix , params ) ; 
% result is the minimum as a vector (x_out) and the value of the function at the minimum (y_out)

What this script basically does is:

load all the relevant data ( in this case a forex pair )
creates a set of scaled features
creates a necessary parameter matrix for the discrete optimisation function
sets the parameters for the optimisation routine
and finally calls the "bayesoptdisc" function

Note that in step 2 all the features are declared as global variables, this being necessary because the "bayesoptdisc" function of the BayesOpt library does not appear to admit passing these variables as inputs to the function.

The actual function to be optimised is given in the following code box, and is basically a looped neural net training routine.

## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} optimise_narx_ind_lookback_nodes_rolling (@var{input1})
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-03-21

function [ retval ] = optimise_narx_ind_lookback_nodes_rolling( input1 )
 
% declare all the global variables so the function can "see" them
global start_opt_line_no ;
global stop_opt_line_no ; 
global targets ;
global sma20r ; % 2
global sma20slope ;
global sma20accel ;
global sma20jerk ;
global sma20diffs ;
global sma20diffslope ;
global sma20diffaccel ;
global sma20diffjerk ;
global base_strength_f ;
global slope_base_strength_f ;
global accel_base_strength_f ;
global jerk_base_strength_f ;
global term_strength_f ;
global slope_term_strength_f ;
global accel_term_strength_f ;
global jerk_term_strength_f ;
global base_term_osc_f ;
global slope_bt_osc_f ;
global accel_bt_osc_f ;
global jerk_bt_osc_f ;

% build feature matrix from the above global variable according to parameters in input1
hidden_layer_size = input1(1) ;

% training targets
Y = targets( start_opt_line_no:stop_opt_line_no , 1 ) ;

% create empty feature matrix
X = [] ;

% which will always have at least one element of the main price series for the NARX
X = [ X sma20r( start_opt_line_no:stop_opt_line_no , 1:input1(2) ) ] ;

% go through input1 values in turn and add to X if necessary
if input1(3) > 0
X = [ X sma20slope( start_opt_line_no:stop_opt_line_no , 1:input1(3) ) ] ; 
endif

if input1(4) > 0
X = [ X sma20accel( start_opt_line_no:stop_opt_line_no , 1:input1(4) ) ] ; 
endif

if input1(5) > 0
X = [ X sma20jerk( start_opt_line_no:stop_opt_line_no , 1:input1(5) ) ] ; 
endif

if input1(6) > 0
X = [ X sma20diffs( start_opt_line_no:stop_opt_line_no , 1:input1(6) ) ] ; 
endif

if input1(7) > 0
X = [ X sma20diffslope( start_opt_line_no:stop_opt_line_no , 1:input1(7) ) ] ; 
endif

if input1(8) > 0
X = [ X sma20diffaccel( start_opt_line_no:stop_opt_line_no , 1:input1(8) ) ] ; 
endif

if input1(9) > 0
X = [ X sma20diffjerk( start_opt_line_no:stop_opt_line_no , 1:input1(9) ) ] ; 
endif

if input1(10) > 0 % input for base and term strengths together
X = [ X base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
X = [ X term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
endif

if input1(11) > 0
X = [ X slope_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
X = [ X slope_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
endif

if input1(12) > 0
X = [ X accel_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
X = [ X accel_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
endif

if input1(13) > 0
X = [ X jerk_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
X = [ X jerk_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
endif

if input1(14) > 0
X = [ X base_term_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(14) ) ] ; 
endif

if input1(15) > 0
X = [ X slope_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(15) ) ] ; 
endif

if input1(16) > 0
X = [ X accel_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(16) ) ] ; 
endif

if input1(17) > 0
X = [ X jerk_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(17) ) ] ; 
endif

% now the X features matrix has been formed, get its size
X_rows = size( X , 1 ) ; X_cols = size( X , 2 ) ;

X = [ ones( X_rows , 1 ) X ] ; % add bias unit to X

fan_in = X_cols + 1 ; % no. of inputs to a node/unit, including bias
fan_out = 1 ; % no. of outputs from node/unit
r = sqrt( 6 / ( fan_in + fan_out ) ) ; 

rolling_window_length = 100 ;
n_iters = 100 ;
n_iter_errors = zeros( n_iters , 1 ) ;
all_errors = zeros( X_rows - ( rolling_window_length - 1 ) - 1 , 1 ) ;
rolling_window_loop_iter = 0 ;

for rolling_window_loop = rolling_window_length : X_rows - 1 
 
  rolling_window_loop_iter = rolling_window_loop_iter + 1 ;
  
    % train n_iters no. of nets and put the error stats in n_iter_errors
    for ii = 1 : n_iters
      
      % initialise weights
      % see https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
      
      % One option is Orthogonal random matrix initialization for input_to_hidden weights
      % w_i = rand( X_cols + 1 , hidden_layer_size ) ;
      % [ u , s , v ] = svd( w_i ) ;
      % input_to_hidden = [ ones( X_rows , 1 ) X ] * u ; % adding bias unit to X
      
      % using fan_in and fan_out for tanh
      w_i = ( rand( X_cols + 1 , hidden_layer_size ) .* ( 2 * r ) ) .- r ;
      input_to_hidden = X( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , : ) * w_i ; 
      
      % push the input_to_hidden through the chosen sigmoid function
      hidden_layer_output = sigmoid_lecun_m( input_to_hidden ) ;
      
      % add bias unit for the output from hidden
      hidden_layer_output = [ ones( rolling_window_length , 1 ) hidden_layer_output ] ;
      
      % use hidden_layer_output as the input to a linear regression fit to targets Y
      % a la Extreme Learning Machine
      % w = ( inv( X' * X ) * X' ) * y ; the "classic" way for linear regression, where
      % X = hidden_layer_output, but
      w = ( ( hidden_layer_output' * hidden_layer_output ) \ hidden_layer_output' ) * Y( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , 1 ) ;
      % is quicker and recommended
      
      % use these current values of w_i and w for out of sample test
      os_input_to_hidden = X( rolling_window_loop + 1 , : ) * w_i ;
      os_hidden_layer_output = sigmoid_lecun_m( os_input_to_hidden ) ;
      os_hidden_layer_output = [ 1 os_hidden_layer_output ] ; % add bias
      os_output = os_hidden_layer_output * w ;
      n_iter_errors( n_iters ) = abs( Y( rolling_window_loop + 1 , 1 ) - os_output ) ;
      
    endfor
    
    all_errors( rolling_window_loop_iter ) = mean( n_iter_errors ) ;
    
endfor % rolling_window_loop

retval = mean( all_errors ) ;

clear X w_i ; 

endfunction

However, to speed things up for some rapid prototyping, rather than use backpropagation training this function uses the principles of an extreme learning machine and loops over 100 such trained ELMs per set of features contained in a rolling window of length 100 across the entire training data set. Walk forward cross validation is performed for each of the 100 ELMs, an average of the out of sample error obtained, and these averages across the whole data set are then averaged to provide the function return. The code was run on daily bars of the four major forex pairs; EURUSD, GBPUSD, USDCHF and USDYPY.

The results of running the above are quite interesting. The first surprise is that the currency strength indicator and features derived from it were not included in the optimal model for any of the four tested pairs. Secondly, for all pairs, a scaled version of a 20 bar price momentum function, and derived features, was included in the optimal model. Finally, again for all pairs, there was a symmetrically decreasing lookback period across the selected features, and when averaged across all pairs the following pattern results: 10 3 3 2 1 3 3 2 1, which is to be read as:

10 nodes (plus a bias node) in the hidden layer
lookback length of 3 for the scaled values of the SMA20 and the 20 bar scaled momentum function
lookback length of 3 for the slopes/rates of change of the above
lookback length of 2 for the "accelerations" of the above
lookback length of 1 for the "jerks" of the above

So it would seem that the 20 bar momentum function is a better exogenous input than the currency strength indicator. The symmetry across features is quite pleasing, and the selection of these "physical motion" features across all the tested pairs tends to confirm their validity. The fact that the currency strength indicator was not selected does not mean that this indicator is of no value, but perhaps it should not be used for regression purposes, but rather as a filter. More in due course.

Monday, 28 September 2015

Runge-Kutta Example and Code

Following on from my last post I thought I would, as a first step, code up a "straightforward" Runge-Kutta function and show how to deal with the fact that there is no "magic mathematical formula" to calculate the slopes that are an integral part of Runge-Kutta.

My approach is to fit a quadratic function to the last n_bars of price and take the slope of this via my Savitzky-Golay filter convolution code, and in doing so the Runge-Kutta value k1 can easily be obtained. The extrapolation beyond the k1 point to the points k2 and k3 is, in effect, trying to fit to points that have a half bar lead over the last available price. To accommodate this I use the last n_bar - 1 points of a 2 bar simple moving average plus the "position" of points k2 and k3 to calculate the slopes at k2 and k3. A 2 bar simple moving average is used because this has a half bar lag and is effectively an interpolation of the known prices at the "half bar intervals," and therefore points k2 and k3 are one h step ahead of the last half bar interval. The k4 point is again simply calculated directly from prices plus the half bar interval projection from point k3. If all this seems confusing, hopefully the Octave code below will clear things up.

clear all

%  create the raw price series
period = input( 'Period? ' ) ;
sideways = zeros( 1 , 2*period ) ;
uwr = ( 1 : 1 : 2*period ) .* 1.333 / period ; uwr_end = uwr(end) ;
unr = ( 1 : 1 : 2*period ) .* 4 / period ; unr_end = unr(end) ;
dwr = ( 1 : 1 : 2*period ) .* -1.333 / period ; dwr_end = dwr(end) ;
dnr = ( 1 : 1 : 2*period ) .* -4 / period ; dnr_end = dnr(end) ;

trends = [ sideways , uwr , unr.+uwr_end , sideways.+uwr_end.+unr_end , dnr.+uwr_end.+unr_end , dwr.+uwr_end.+unr_end.+dnr_end , sideways ] .+ 2 ;
noise = randn( 1 , length(trends) ) .* 0.0 ;
price = sinewave( length( trends ) , period ) .+ trends .+ noise ;
ma_2 = sma( price , 2 ) ;
 
%  regress over 'n_bar' bars
n_bar = 9 ;
%  and a 'p' order fit
p = 2 ;

%  get the relevant coefficients
slope_coeffs = generalised_sgolay_filter_coeffs( n_bar , p , 1 ) ; 

%  container for 1 bar ahead projection
projection_1_bar = zeros( 1 , length( price ) ) ;

for ii = n_bar : length( price )

%  calculate k1 value i.e. values at price(ii), the most recent price
k1 = price( ii-(n_bar-1) : ii ) * slope_coeffs( : , end ) ;
projection_of_point_k2 = price(ii) + k1 / 2 ;

%  calculate k2 value
k2 = [ ma_2( ii-(n_bar-2) : ii ) ; projection_of_point_k2 ]' * slope_coeffs( : , end ) ;
projection_of_point_k3 = price(ii) + k2 / 2 ;

%  calculate k3 value
k3 = [ ma_2( ii-(n_bar-2) : ii ) ; projection_of_point_k3 ]' * slope_coeffs( : , end ) ;
projection_of_point_k4 = price(ii) + k3 / 2 ;

%  calculate k4 value
k4 = [ price( ii-(n_bar-2) : ii ) , projection_of_point_k4 ] * slope_coeffs( : , end ) ;

%  the runge-kutta weighted moving average
projection_1_bar(ii) = price(ii) + ( k1 + 2 * ( k2 + k3 ) + k4 ) / 6 ;

end

%  shift for plotting
projection_1_bar = shift( projection_1_bar , 1 ) ;
projection_1_bar( : , 1:n_bar ) = price( : , 1:n_bar ) ;

plot( price , 'c' , projection_1_bar , 'r' ) ;

This code produces a plot like this, without noise,

and this with noise ( line 12 of the code ).

The cyan line is the underlying price and the red line is the Runge-Kutta 1 bar ahead projection. As can be seen, when the price is moving in rather straight lines the projection is quite accurate, however, at turnings there is some overshoot, which is to be expected. I'm not unduly concerned about this overshoot as my intention is simply to get the various k values as features, but this overshoot does have some implications which I will discuss in a future post.

Saturday, 26 October 2013

Change of Direction in Neural Net Training.

Up till now when training my neural net classifier I have been using what I call my "idealised" data, which is essentially a model for prices comprised of a sine wave along with a linear trend. In my recent work on Savitzky-Golay filters I wanted to increase the amount of available training data but have run into a problem - the size of my data files has become so large that I'm exhausting the memory available to me in my Octave software install. To overcome this problem I have decided to convert my NN training to an online regime rather than loading csv data files for mini-batch gradient descent training - in fact what I envisage doing would perhaps be best described as online mini-batch training.

To do this will require a fairly major rewrite of my data generation code and I think I will take the opportunity to make this data generation more realistic than the sine wave and trend model I am currently using. Apart from the above mentioned memory problem, this decision has been prompted by an e-mail exchange I have recently had with a reader of this blog and some online reading I have recently been doing, such as mixed regime simulation, training a simple neural network to recognise different regimes in financial time series, neural networks in trading, random subspace optimisation and profitable hidden and markovian coupling. At the moment it is my intention to revisit my earlier posts here and here and use/extend the ideas contained therein. This means that for the nearest future my work on Savitzky-Golay filters is temporarily halted.

Tuesday, 15 October 2013

Code refactoring didn't work.

In my previous post I said that I was going to refactor code to use my existing, trained classification neural net with Savitzky-Golay filter inputs in the hope that this might show some improvement. I have to report that this was an abysmal failure and I suppose it was a bit naive to have ever expected that it would work. It looks like I will have to explicitly train a new neural network to take advantage of Savitzky-Golay filter features.

Sunday, 6 October 2013

Update on Savitzky-Golay Filters

For the past couple of weeks I have been playing around with Savitzky-Golay filters with a hope of creating a moving endpoint regression line as a form of zero lag smoothing, but unfortunately I have been unable to come up with anything that is remotely satisfying and I think I'm going to abandon this for now. My view at the moment is that SG filters' utility, for my purposes at least, is limited to feature extraction via the polynomial coefficients to get the 2nd, 3rd and 4th derivatives as inputs for my Neural net classifier.

To do this will require a time consuming retraining period, so before I embark on this I'm going to use what I think will be an immediately useful SG filter application. In a previous post readers can see three screenshots of moving windows of my idealised price series with SG filters overlaid. The SG filters follow these windowed "prices" so closely that I think SG filters more or less == windowed idealised prices and features derived there from. However, when it comes to applying my currently trained NN the features are derived from the noisy real prices seen in the video of the above linked post. What I intend to do wherever possible is to use a cubic SG filter applied to windowed prices and then derive input features from the SG filter values. Effectively what I will be doing is coercing real prices into smooth curves that far more closely resemble the idealised prices my NN was trained on. The advantage of this is that I will not have to retrain the NN, but only refactor my feature extraction code for use on real prices to this end. The hope is, of course, that this tweak will result in vastly improved performance to the extent that I will not need to consider future classifier NN retraining with polynomial derivatives and can proceed directly to training other NNs.

Finally, on a related note, I have recently been directed to this paper, which has some approaches to time series classification that I might incorporate in future NN training. Fig. 5 on page 9 shows six examples of time series classification categories. My idealised prices categories currently encompass what is described in the paper as "Cyclic," "Increasing Trend" and "Decreasing Trend." I shall have to consider whether it might be useful to extend my categories to include "Normal," "Upward Shift" and "Downward Shift."

Sunday, 22 September 2013

Savitzky-Golay Filters in Action

As a first step in investigating SG filters I have simply plotted them on samples of my idealised time series data, three samples shown below:

where the underlying "price" is cyan, and the 3rd, 4th, 5th, 6th and 7th order SG fits are blue, red, magenta, green and yellow respectively. As can be seen, 5th order and above perfectly match the underlying price to the extent that the cyan line is completely obscured.

Repeating the above on real prices (ES-mini close) is shown in the video below:

Firstly, the thing that is immediately apparent is that real prices are more volatile than my idealised price series, a statement of the obvious perhaps, but it has led me to reconsider how I model my idealised price series. However, there are times when the patterns of the SG filters on real prices converge to almost identically resemble the patterns of SG filters on my idealised prices. This observation encourages me to continue to look at applying SG filters for my purposes. How I plan to use SG filters and modify my idealised price series will be the subject of my next post.

Sunday, 15 September 2013

Savitzky-Golay Filter Convolution Coefficients

My investigation so far has had me reading General Least Squares Smoothing And Differentiation By The Convolution Method, a paper in the reference section of the Wikipedia page on Savitzky-Golay filters. In this paper there is Pascal code for calculating the convolution coefficients for a Savitzky-Golay filter, and in the code box below is my C++ translation of this Pascal code. Nb, due to formatting issues the symbol "<" appears as ampersand lt semi-colon in this code.

// Calculate Savitzky-Golay Convolution Coefficients
#include 
using namespace std;

double GenFact (int a, int b)
{ // calculates the generalised factorial (a)(a-1)...(a-b+1)

  double gf = 1.0 ;
  
  for ( int jj = (a-b+1) ; jj &lt; a+1 ; jj ++ ) 
  {
    gf = gf * jj ; 
  }  
    return (gf) ;
    
} // end of GenFact function

double GramPoly( int i , int m , int k , int s )
{ // Calculates the Gram Polynomial ( s = 0 ), or its s'th
  // derivative evaluated at i, order k, over 2m + 1 points
  
double gp_val ;  

if ( k > 0 )
{   
gp_val = (4.0*k-2.0)/(k*(2.0*m-k+1.0))*(i*GramPoly(i,m,k-1,s) + s*GramPoly(i,m,k-1.0,s-1.0)) - ((k-1.0)*(2.0*m+k))/(k*(2.0*m-k+1.0))*GramPoly(i,m,k-2.0,s) ;
}
else
{  
  if ( ( k == 0 ) && ( s == 0 ) )
  {
     gp_val = 1.0 ;
  }     
  else
  {
     gp_val = 0.0 ;
  } // end of if k = 0 & s = 0  
} // end of if k > 0

return ( gp_val ) ;

} // end of GramPoly function

double Weight( int i , int t , int m , int n , int s )
{ // calculates the weight of the i'th data point for the t'th Least-square
  // point of the s'th derivative, over 2m + 1 points, order n

double sum = 0.0 ;

for ( int k = 0 ; k &lt; n + 1 ; k++ )
{
sum += (2.0*k+1.0) * ( GenFact(2.0*m,k) / GenFact(2.0*m+k+1.0,k+1.0) ) * GramPoly(i,m,k,0) * GramPoly(t,m,k,s) ;    
} // end of for loop

return ( sum ) ;
                                               
} // end of Weight function

int main ()
{ // calculates the weight of the i'th data point (1st variable) for the t'th   Least-square point (2nd variable) of the s'th derivative (5th variable), over 2m + 1 points (3rd variable), order n (4th variable)
  
  double z ;
  
  z = Weight (4,4,4,2,0) ; // adjust input variables for required output
  
  cout &lt;&lt; "The result is " &lt;&lt; z ;
  
  return 0 ;
}

Whilst I don't claim to understand all the mathematics behind this, the use of the output of this code is conceptually not that much more difficult than a FIR filter, i.e. multiply each value in a look back window by some coefficient, except that a polynomial of a given order is being least squares fit to all points in the window rather than, for example, an average of all the points being calculated. The code gives the user the ability to calculate the value of the polynomial at each point in the window, or its derivative(s) at that point.

I would particularly draw readers' attention to the derivative(s) link above. On this page there is an animation of the slope (1st derivative) of a waveform. If this were to be applied to a price time series the possible applications become apparent: a slope of zero will pick out local highs and lows to identify local support and resistance levels; an oscillator zero crossing will give signals to go long or short; oscillator peaks will give signals to tighten stops or place profit taking orders. And why stop there? Why not use the second derivative as a measure of the acceleration of a price move? And how about the jerk and the jounce of a price series? I don't know about readers, but I have never seen a trading article, forum post or blog talk about the jerk and jounce of price series. Of course, if one were to plot all these as indicators I'm sure it would just be confusing and nigh impossible to come up with a cogent rule set to trade with. However, one doesn't have to do this oneself, and this is what I was alluding to in my earlier post when I said that the Savitzky-Golay framework might provide unique and informative inputs to my neural net trading system. Providing enough informative data is available for training, the rule set will be encapsulated within the weights of any trained neural net.

Tuesday, 10 September 2013

Savitzky-Golay Filters

Recently I was contacted by a reader who asked me if I had ever looked into Savitzky-Golay filters and my reply was that I had and that, in my ignorance, I hadn't found them useful. However, thinking about it, I think that I should look into them again. The first time I looked was many years ago when I was an Octave coding novice and also my general knowledge about filters was somewhat limited.

As prelude to this new investigation I have below plotted two charts of real ES Mini closes in cyan, along with a Savitzky-Golay filter in red and the "Super Smoother" in green for comparative purposes.

The Savitzky-Golay filter is easily available in Octave by a simple call such as

filter = sgolayfilt( close , 3 , 11 ) ;

which is the actual call that the above are plots of, which is an 11 bar cubic polynomial fit. As can be seen, the Savitzky-Golay filter is generally as smooth as the super smoother but additionally seems to have less lag and doesn't overshoot at turning points like the super smoother. Given a choice between the two, I'd prefer to use the Savitzsky-Golay filter.

However, there is a serious caveat to the use of the S-G filter - it's a centred filter, which means that its calculation requires "future" values. This feature is well demonstrated by the animated figure at the top right of the above linked Wikipedia page. Although I can't accurately recall the reason(s) I abandoned the S-G filter all those years ago, I'm pretty sure this was an important factor. The Wikipedia page suggests how this might be overcome in its "Treatment of first and last points" section. Another way to approach this could be to adapt the methodology I employed in creating my perfect oscillator indicator, and this is in fact what I intend to do over the coming days and weeks. In addition to this I shall endeavour to make the length of the filter adaptive rather than having a fixed bar length polynomial calculation.

And the reason for setting myself this task? Apart from perhaps creating an improved smoother following my recent disappointment with frequency domain smoothing, there is the possibility of creating useful inputs for my Neural Net trading algorithm: the framework of S-G filters allows for first, second, third and fourth derivatives to be easily calculated and I strongly suspect that, if my investigations and coding turn out to be successful, these will be unique and informative trading inputs.

Pages