Dekalog Blog: Moving averages

Showing posts with label Moving averages. Show all posts

Monday, 30 December 2024

A "New" Way to Smooth Price

Below is code for an Octave compiled C++ .oct function to smooth price data.

#include "octave oct.h"
#include "octave dcolvector.h"
#include "octave dmatrix.h"
#include "GenFact.h"
#include "GramPoly.h"
#include "Weight.h"

DEFUN_DLD ( double_smooth_proj_2_5, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} double_smooth_proj_2_5 (@var{input_vector})\n\
This function takes an input series and smooths it by projecting a 5 bar rolling linear fit\n\
3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input\n\
to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e. a 0.5 bar\n\
lead over the last actual rolling point in the series. This is averaged with the previously\n\
calculated such smoothed point for a theoretical zero-lagged smooth. This smooth is\n\
again smoothed as above for a smooth of the smooth, i.e. a double-smooth of the\n\
original input series. The double_smooth and smooth are the function outputs.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin > 1 ) // there must be a single, input price vector
   {
   error ("Invalid arguments. Inputs are a single, input price vector.") ;
   return retval_list ;
   }

if ( args(0).length () < 5 )
   {
   error ("Invalid 1st argument length. Input is a price vector of length >= 5.") ;
   return retval_list ;
   }
// end of input checking

ColumnVector input = args(0).column_vector_value () ;
ColumnVector smooth = args(0).column_vector_value () ;
ColumnVector double_smooth = args(0).column_vector_value () ;

// create the fit coefficients matrix
int p = 5 ;             // the number of points in calculations
int m = ( p - 1 ) / 2 ; // value to be passed to call_GenPoly_routine_from_octfile
int n = 1 ;             // value to be passed to call_GenPoly_routine_from_octfile
int s = 0 ;             // value to be passed to call_GenPoly_routine_from_octfile

  // create matrix for fit coefficients
  Matrix fit_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        fit_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  // create matrix for slope coefficients
  Matrix slope_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  s = 1 ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        slope_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  Matrix smooth_coefficients ( 1 , 5 ) ;
  // fill the smooth_coefficients matrix, smooth_coeffs = ( 9/24 ) .* fit_coeffs + ( 7/12 ) .* slope_coeffs + [ 0 ; 1/24 ; 1/8 ; 5/24 ; 1/4 ] ;
  smooth_coefficients( 0 , 0 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 0 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 0 , 4 ) ;
  smooth_coefficients( 0 , 1 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 1 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 1 , 4 ) + ( 1.0 / 24.0 ) ;
  smooth_coefficients( 0 , 2 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 2 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 2 , 4 ) + ( 1.0 / 8.0 ) ;
  smooth_coefficients( 0 , 3 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 3 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 3 , 4 ) + ( 5.0 / 24.0 ) ;
  smooth_coefficients( 0 , 4 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 4 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 4 , 4 ) + ( 1.0 / 4.0 ) ;

    for ( octave_idx_type ii (4) ; ii < args(0).length () ; ii++ )
        {

         smooth( ii ) = smooth_coefficients( 0 , 0 ) * input( ii - 4 ) + smooth_coefficients( 0 , 1 ) * input( ii - 3 ) + smooth_coefficients( 0 , 2 ) * input( ii - 2 ) +
                        smooth_coefficients( 0 , 3 ) * input( ii - 1 ) + smooth_coefficients( 0 , 4 ) * input( ii ) ;

         double_smooth( ii ) = smooth_coefficients( 0 , 0 ) * smooth( ii - 4 ) + smooth_coefficients( 0 , 1 ) * smooth( ii - 3 ) + smooth_coefficients( 0 , 2 ) * smooth( ii - 2 ) +
                               smooth_coefficients( 0 , 3 ) * smooth( ii - 1 ) + smooth_coefficients( 0 , 4 ) * smooth( ii ) ;

        }

retval_list( 0 ) = double_smooth ;
retval_list( 1 ) = smooth ;

return retval_list ;

} // end of function

Rather than describe it, I'll just paste the "help" description below:-

"This function takes an input series and smooths it by projecting a 5 bar rolling linear fit 3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e. a 0.5 bar lead over the last actual rolling point in the series. This is averaged with the previously calculated such smoothed point for a theoretical zero-lagged smooth. This smooth is again smoothed as above for a smooth of the smooth, i.e. a double-smooth of the original input series. The double_smooth and smooth are the function outputs."

The above function calls previous code of mine to calculate savitzky golay filter convolution coefficients for internal calculations, but for the benefit of readers I will simply post the final coefficients for a 5 tap FIR filter.

-0.191667  -0.016667   0.200000   0.416667   0.591667

These coefficients are for a "single" smooth. Run the output from a function using these through the same function to get the "double" smooth.

Testing on a sinewave looks like this:-

where the cyan, green and blue filters are the original FIR filter with 2.5 bar lag, a 5 bar SMA and Ehler's super smooth (see code here) applied to sinewave "price" for comparative purposes. The red filter is my "double_smooth" and the magenta is a Jurik Moving Average using an Octave adaptation of code that is/was freely available on the Tradingview website. The parameters for this Jurik average (length, phase and power) were chosen to match, as closely as possible, those of the double_smooth for an apples to apples comparison.

I will not discuss this any further in this post other than to say I intend to combine this with the ideas contained in my new use for kalman filter post.

Tuesday, 9 April 2024

A "New" Use for Kalman Filter on Price Time Series?

During the course of writing this blog I have visited the idea of using Kalman filters several times, most recently in this February 2023 post. My motivation in these previous posts could best be described as trying to smooth price data with as little lag as possible, i.e. create a zero-lag indicator. In doing so, the model most often used for the Kalman filter was a physical motion model with position, velocity and acceleration components. Whilst these "worked" in the sense of smoothing the underlying data, it is not necessarily a good model to use on financial data because, obviously, financial data is not a physical system and so I thought I would apply the Kalman filter to something that is ubiquitously used on financial data - the Exponential moving average.

Below is an Octave function to calculate a "Kalman_ema" where the prediction part of the filter is just a linear extrapolation of an exponential moving average and then this extrapolated value is used to calculate the projected price, with the measurement being the real price and its ema value.

## Copyright (C) 2024 dekalog
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} kalman_ema (@var{price}, @var{lookback})
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2024-04-08

function [ filter_out , P_out ] = kalman_ema ( price , lookback )

## check price is row vector
if ( size( price , 1 ) > 1 && size( price , 2 ) == 1 )
 price = price' ;
endif

P_out = zeros( numel( price ) , 2 ) ;

ema_price = ema( price , lookback )' ;
alpha = 2 / ( lookback + 1 ) ;

## initial covariance matrix
P = eye( 3 ) ;

## transistion matrix
A = [ 0 , ( 1 + alpha ) / alpha , -( 1 / alpha ) ; ...
       0 , 2 , -1 ; ...
       0 , 1 , 0 ] ;

## initial Q
Q = eye( 3 ) ;

## measurement vector
Y = [ price ; ...
       ema_price ; ...
       shift( ema_price , 1 ) ] ;
Y( 3 , 1 ) = Y( 2 , 1 ) ;

## measurement matrix
H = eye( 3 ) ;

## measurement noise covariance
R = eye( 3 ) ;

## container for Kalman filter output
filter_out = zeros( size( Y ) ) ;
errors = ones( 3 , 1 ) ;

for ii = 2 : size( Y , 2 )

  X = A * filter_out( : , ii - 1 ) ;
  P = A * P * A' + Q ;

  errors = alpha .* abs( X - Y( : , ii ) ) + ( 1 - alpha ) .* errors ;

  IM = H * X ; ## Mean of predictive distribution
  IS = ( R + H * P * H' ) ; ## Covariance of predictive mean
  K = P * H' / IS ; ## Computed Kalman gain
  X = X + K * ( Y( : , ii ) - IM ) ; ## Updated state mean
  P = P - K * IS * K' ; ## Updated state covariance

  filter_out( : , ii ) = X ;
  P_out( ii , 1 ) = P( 1 , 1 ) ;
  P_out( ii , 2 ) = P( 2 , 2 ) ;

  ## update Q and R
  Q( 1 , 1 ) = errors( 1 ) ; Q( 2 , 2 ) = errors( 2 ) ; Q( 3 , 3 ) = errors( 3 ) ;
  R = Q ;

endfor

filter_out = filter_out' ;

endfunction

Using this for smoothing either the price or the ema has no utility, but a by-product of the filter is the Covariance matrix from which it is possible to plot bands around the price series. The following chart shows the bands for various ema alpha values corresponding to various Fibonacci sequence length look backs for the ema alpha value.

This next chart, for purposes of clarity, shows the "Golden Cross" lengths of 50 and 200

and this final chart shows an adaptive look back length which is a function of the instantaneous measured period (see here or here) of the underlying data.

Despite the wide range of the input look back lengths for the ema, it can be seen that the covariance bands around price are broadly similar. I can think of many uses for such a data driven but basically parameter insensitive measure of price variance, e.g. entry/exit levels, stop levels, position sizing etc.

Friday, 10 July 2020

Time Warp Edit Distance as a Loss Function

Some time ago I posted about the time warp edit distance (twed) and recently I've revisited this as a possible loss function. The basic idea is to use this during algorithm training to match a "perfect equity curve" rather than use the usual loss measurements such as mean squared error for regression or cross entropy for classification.

As a proof of concept I've been playing around with this Octave script

clear all ;
1 ;

function J = twed_loss( x )
 global price ;
 global returns ;
 global perfect_equity_curve ; 
 fast_ma = sma( price , round( x( 1 ) ) ) ;
 slow_ma = sma( price , round( x( 2 ) ) ) ;
 test_position_vector = sign( fast_ma .- slow_ma ) ;
 test_position_vector = shift( test_position_vector , 2 ) ;
 test_position_vector( 1 : 3 ) = 0 ;
 test_equity_curve = cumsum( returns .* test_position_vector ) ;
 x_axis = ( 1 : numel( perfect_equity_curve ) )' ;
 J = twed( perfect_equity_curve , x_axis , test_equity_curve , x_axis , 1 , 0.001 ) ;
endfunction

## create a perfectly predictable price
global price = sinewave( 200 , 20 )' .+ 5 ;
global returns = [ 0 ; diff( price ) ] ;
perfect_position_vector = sign( returns ) ;
perfect_position_vector = shift( perfect_position_vector , 2 ) ; 
perfect_position_vector( 1 : 3 ) = 0 ;
global perfect_equity_curve = cumsum( returns .* perfect_position_vector ) ;

## now do the Baysian training
## set up the parameters for bayesopt
params.n_iterations = 300 ; ## 190
params.n_init_samples = 10 ; ## 10
params.n_iter_relearn = 1 ; ## Number of iterations between re-learning kernel parameters. That is, kernel learning ocur 1 out of n_iter_relearn iterations. 
## Ideally, the best precision is obtained when the kernel parameters are learned every iteration (n_iter_relearn=1). 
## However, this learning part is computationally expensive and implies a higher cost per iteration. If n_iter_relearn=0, then there is no relearning. [Default 50]
params.crit_name = 'cEI' ;
params.surr_name = 'sStudentTProcessNIG' ;
params.noise = 1e-6 ;
params.kernel_name = 'kMaternARD5' ;
params.kernel_hp_mean = [ 1 ] ;
params.kernel_hp_std = [ 10 ] ;
params.verbose_level = 0 ; ## Negative -> Error -> stdout 0 -> Warning -> stdout 1 -> Info -> stdout 2 -> Debug -> stdout
                           ## 3 -> Warning -> log file 4 -> Info -> log file 5 -> Debug -> log file 5 -> Error -> log file
params.load_save_flag = 0 ; ## 1-Load data, 2-Save data, 3-Load and append data. Other values, no file saving or restore [Default 0]
params.log_filename = '/home/dekalog/Documents/octave/twed/bayeopt.log' ; % Name/path of the log file 
## (if applicable, verbose_level>=3) [Default "bayesopt.log"]
params.load_filename = '/home/dekalog/Documents/octave/twed/bayeopt.log' ;
params.save_filename = '/home/dekalog/Documents/octave/twed/bayeopt.log' ;

lb = [ 2 3 ] ;
## upper bounds
ub = [ 30 30 ] ;
nDimensions = length( lb ) ;
[ xmin , fmin ] = bayesoptcont( 'twed_loss' , nDimensions , params , lb , ub ) ;
round( xmin )
fmin

fast_ma = sma( price , round( xmin(1) ) ) ;
slow_ma = sma( price , round( xmin(2) ) ) ;
figure(1) ; plot(price,'k','linewidth',2,fast_ma,'r','linewidth',2,slow_ma,'b','linewidth',2 ) ;
title( 'PRICE AND MA CROSSOVER SIGNALS' ) ; legend( 'PRICE' , 'FAST MA' , 'SLOW MA' ) ;
test_position_vector = sign( fast_ma .- slow_ma ) ;
test_position_vector = shift( test_position_vector , 2 ) ;
test_position_vector( 1 : 3 ) = 0 ;
test_equity_curve = cumsum( returns .* test_position_vector ) ; 
figure(2) ; plot(perfect_equity_curve,'b','linewidth',2,test_equity_curve,'r','linewidth',2) ;
title( 'EQUITY CURVES' ) ; legend( 'PERFECT EQUITY CURVE' , 'TEST EQUITY CURVE' ) ;

which produces plots such as this

and this,

which both show a fast and slow moving average crossover system on sine wave "price" of period 20, optimised to match equity curves such as below via the twed loss.

What is interesting about this is that the moving average lengths usually converge to more or less the expected theoretical optimum, with the required change of sign of signal, where the crossovers indicate peaks and troughs in price and hence perfect entry and exit signals.

However, sometimes the solution looks like this,

which is an 11 period fast moving average and a 10 period slow one, quite a contrarian solution compared to the theoretical optimum, but actually giving a lower twed loss.

I quite like the idea of optimising for what we actually care about, i.e. the equity curve, whilst at the same time possibly uncovering unique solutions. It seems that the twed loss shows promise.

Thursday, 20 April 2017

Using the BayesOpt Library to Optimise my Planned Neural Net

Following on from my last post, I have recently been using the BayesOpt library to optimise my planned neural net, and this post is a brief outline, with code, of what I have been doing.

My intent was to design a Nonlinear autoregressive exogenous model using my currency strength indicator as the main exogenous input, along with other features derived from the use of Savitzky-Golay filter convolution to model velocity, acceleration etc. I decided that rather than model prices directly, I would model the 20 period simple moving average because it would seem reasonable to assume that modelling a smooth function would be easier, and from this average it is a trivial matter to reverse engineer to get to the underlying price.

Given that my projected feature space/lookback length/number of nodes combination is/was a triple digit, discrete dimensional problem, I used the "bayesoptdisc" function from the BayesOpt library to perform a discrete Bayesian optimisation over these parameters, the main Octave script for this being shown below.

clear all ;

% load the data
% load eurusd_daily_bin_bars ;
% load gbpusd_daily_bin_bars ;
% load usdchf_daily_bin_bars ;
load usdjpy_daily_bin_bars ;
load all_rel_strengths_non_smooth ;
% all_rel_strengths_non_smooth = [ usd_rel_strength_non_smooth eur_rel_strength_non_smooth gbp_rel_strength_non_smooth chf_rel_strength_non_smooth ...
% jpy_rel_strength_non_smooth aud_rel_strength_non_smooth cad_rel_strength_non_smooth ] ;

% extract relevant data
% price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice 
% price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; % midprice  
price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; % midprice
base_strength = all_rel_strengths_non_smooth( : , 1 ) .- 0.5 ;
term_strength = all_rel_strengths_non_smooth( : , 5 ) .- 0.5 ; 

% clear unwanted data
% clear eurusd_daily_bars all_rel_strengths_non_smooth ;
% clear gbpusd_daily_bars all_rel_strengths_non_smooth ;
% clear usdchf_daily_bars all_rel_strengths_non_smooth ;
clear usdjpy_daily_bars all_rel_strengths_non_smooth ;

global start_opt_line_no = 200 ;
global stop_opt_line_no = 7545 ;

% get matrix coeffs
slope_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 1 ) ;
accel_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 2 ) ;
jerk_coeffs = generalised_sgolay_filter_coeffs( 5 , 3 , 3 ) ;

% create features
sma20 = sma( price , 20 ) ;
global targets = sma20 ;
[ sma_max , sma_min ] = adjustable_lookback_max_min( sma20 , 20 ) ;
global sma20r = zeros( size(sma20,1) , 5 ) ;
global sma20slope = zeros( size(sma20,1) , 5 ) ;
global sma20accel = zeros( size(sma20,1) , 5 ) ;
global sma20jerk = zeros( size(sma20,1) , 5 ) ;

global sma20diffs = zeros( size(sma20,1) , 5 ) ;
global sma20diffslope = zeros( size(sma20,1) , 5 ) ;
global sma20diffaccel = zeros( size(sma20,1) , 5 ) ;
global sma20diffjerk = zeros( size(sma20,1) , 5 ) ;

global base_strength_f = zeros( size(sma20,1) , 5 ) ;
global term_strength_f = zeros( size(sma20,1) , 5 ) ;

base_term_osc = base_strength .- term_strength ;
global base_term_osc_f = zeros( size(sma20,1) , 5 ) ;
slope_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
accel_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
jerk_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_bt_osc_f = zeros( size(sma20,1) , 5 ) ;

slope_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_base_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_base_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_base_strength_f = zeros( size(sma20,1) , 5 ) ;

slope_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_term_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_term_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_term_strength_f = zeros( size(sma20,1) , 5 ) ;

min_max_range = sma_max .- sma_min ;

for ii = 51 : size( sma20 , 1 ) - 1 % one step ahead is target
  
  targets(ii) = 2 * ( ( sma20(ii+1) - sma_min(ii) ) / min_max_range(ii) - 0.5 ) ;
  
  % scaled sma20
  sma20r(ii,:) = 2 .* ( ( flipud( sma20(ii-4:ii,1) )' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ;
  sma20slope(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * slope_coeffs ) ;
  sma20accel(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * accel_coeffs ) ;
  sma20jerk(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * jerk_coeffs ) ;
  
  % scaled diffs of sma20
  sma20diffs(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' ) ;
  sma20diffslope(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * slope_coeffs ) ;
  sma20diffaccel(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * accel_coeffs ) ;
  sma20diffjerk(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * jerk_coeffs ) ;
  
  % base strength
  base_strength_f(ii,:) = fliplr( base_strength(ii-4:ii)' ) ;
  slope_base_strength_f(ii,:) = fliplr( slope_base_strength(ii-4:ii)' ) ;
  accel_base_strength_f(ii,:) = fliplr( accel_base_strength(ii-4:ii)' ) ;
  jerk_base_strength_f(ii,:) = fliplr( jerk_base_strength(ii-4:ii)' ) ;
  
  % term strength
  term_strength_f(ii,:) = fliplr( term_strength(ii-4:ii)' ) ;
  slope_term_strength_f(ii,:) = fliplr( slope_term_strength(ii-4:ii)' ) ;
  accel_term_strength_f(ii,:) = fliplr( accel_term_strength(ii-4:ii)' ) ;
  jerk_term_strength_f(ii,:) = fliplr( jerk_term_strength(ii-4:ii)' ) ;
  
  % base term oscillator
  base_term_osc_f(ii,:) = fliplr( base_term_osc(ii-4:ii)' ) ;
  slope_bt_osc_f(ii,:) = fliplr( slope_bt_osc(ii-4:ii)' ) ;
  accel_bt_osc_f(ii,:) = fliplr( accel_bt_osc(ii-4:ii)' ) ;
  jerk_bt_osc_f(ii,:) = fliplr( jerk_bt_osc(ii-4:ii)' ) ;
  
endfor

% create xset for bayes routine
% raw indicator
xset = zeros( 4 , 5 ) ; xset( 1 , : ) = 1 : 5 ;
% add the slopes
to_add = zeros( 4 , 15 ) ; 
to_add( 1 , : ) = [ 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ; 
xset = [ xset to_add ] ; 
% add accels
to_add = zeros( 4 , 21 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 1 2 2 3 3 3 1 2 3 4 2 3 3 4 4 4 4 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 1 1 2 1 2 3 1 1 1 1 2 2 3 1 2 3 4 ] ;
xset = [ xset to_add ] ; 
% add jerks
to_add = zeros( 4 , 70 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 2 1 2 2 2 3 3 3 3 3 3 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 2 1 1 2 2 1 2 2 3 3 3 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 4 , : ) = [ 1 1 1 1 2 1 1 1 2 1 1 2 1 2 3 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
% construct all_xset for combinations of indicators and look back lengths
all_zeros = zeros( size( xset ) ) ;
all_xset = [ xset ; repmat( all_zeros , 3 , 1 ) ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; xset ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; xset ] ] ;
all_xset = [ all_xset repmat( xset , 4 , 1 ) ] ;

ones_all_xset = ones( 1 , size( all_xset , 2 ) ) ;

% now add layer for number of neurons and extend as necessary
max_number_of_neurons_in_layer = 20 ;

parameter_matrix = [] ;

for ii = 2 : max_number_of_neurons_in_layer % min no. of neurons is 2, max = max_number_of_neurons_in_layer  
  parameter_matrix = [ parameter_matrix [ ii .* ones_all_xset ; all_xset ] ] ;
endfor

% now the actual bayes optimisation routine
% set the parameters
params.n_iterations = 190; % bayesopt library default is 190
params.n_init_samples = 10;
params.crit_name = 'cEIa'; % cEI is default. cEIa is an annealed version
params.surr_name = 'sStudentTProcessNIG';
params.noise = 1e-6;
params.kernel_name = 'kMaternARD5';
params.kernel_hp_mean = [1];
params.kernel_hp_std = [10];
params.verbose_level = 1; % 3 to path below
params.log_filename = '/home/dekalog/Documents/octave/cplusplus.oct_functions/nn_functions/optimise_narx_ind_lookback_nodes_log';
params.l_type = 'L_MCMC' ; % L_EMPIRICAL is default
params.epsilon = 0.5 ; % probability of performing a random (blind) evaluation of the target function. 
% Higher values implies forced exploration while lower values relies more on the exploration/exploitation policy of the criterion. 0 is default

% the function to optimise
fun = 'optimise_narx_ind_lookback_nodes_rolling' ;

% the call to the Bayesopt library function
bayesoptdisc( fun , parameter_matrix , params ) ; 
% result is the minimum as a vector (x_out) and the value of the function at the minimum (y_out)

What this script basically does is:

load all the relevant data ( in this case a forex pair )
creates a set of scaled features
creates a necessary parameter matrix for the discrete optimisation function
sets the parameters for the optimisation routine
and finally calls the "bayesoptdisc" function

Note that in step 2 all the features are declared as global variables, this being necessary because the "bayesoptdisc" function of the BayesOpt library does not appear to admit passing these variables as inputs to the function.

The actual function to be optimised is given in the following code box, and is basically a looped neural net training routine.

## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} optimise_narx_ind_lookback_nodes_rolling (@var{input1})
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-03-21

function [ retval ] = optimise_narx_ind_lookback_nodes_rolling( input1 )
 
% declare all the global variables so the function can "see" them
global start_opt_line_no ;
global stop_opt_line_no ; 
global targets ;
global sma20r ; % 2
global sma20slope ;
global sma20accel ;
global sma20jerk ;
global sma20diffs ;
global sma20diffslope ;
global sma20diffaccel ;
global sma20diffjerk ;
global base_strength_f ;
global slope_base_strength_f ;
global accel_base_strength_f ;
global jerk_base_strength_f ;
global term_strength_f ;
global slope_term_strength_f ;
global accel_term_strength_f ;
global jerk_term_strength_f ;
global base_term_osc_f ;
global slope_bt_osc_f ;
global accel_bt_osc_f ;
global jerk_bt_osc_f ;

% build feature matrix from the above global variable according to parameters in input1
hidden_layer_size = input1(1) ;

% training targets
Y = targets( start_opt_line_no:stop_opt_line_no , 1 ) ;

% create empty feature matrix
X = [] ;

% which will always have at least one element of the main price series for the NARX
X = [ X sma20r( start_opt_line_no:stop_opt_line_no , 1:input1(2) ) ] ;

% go through input1 values in turn and add to X if necessary
if input1(3) > 0
X = [ X sma20slope( start_opt_line_no:stop_opt_line_no , 1:input1(3) ) ] ; 
endif

if input1(4) > 0
X = [ X sma20accel( start_opt_line_no:stop_opt_line_no , 1:input1(4) ) ] ; 
endif

if input1(5) > 0
X = [ X sma20jerk( start_opt_line_no:stop_opt_line_no , 1:input1(5) ) ] ; 
endif

if input1(6) > 0
X = [ X sma20diffs( start_opt_line_no:stop_opt_line_no , 1:input1(6) ) ] ; 
endif

if input1(7) > 0
X = [ X sma20diffslope( start_opt_line_no:stop_opt_line_no , 1:input1(7) ) ] ; 
endif

if input1(8) > 0
X = [ X sma20diffaccel( start_opt_line_no:stop_opt_line_no , 1:input1(8) ) ] ; 
endif

if input1(9) > 0
X = [ X sma20diffjerk( start_opt_line_no:stop_opt_line_no , 1:input1(9) ) ] ; 
endif

if input1(10) > 0 % input for base and term strengths together
X = [ X base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
X = [ X term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
endif

if input1(11) > 0
X = [ X slope_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
X = [ X slope_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
endif

if input1(12) > 0
X = [ X accel_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
X = [ X accel_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
endif

if input1(13) > 0
X = [ X jerk_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
X = [ X jerk_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
endif

if input1(14) > 0
X = [ X base_term_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(14) ) ] ; 
endif

if input1(15) > 0
X = [ X slope_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(15) ) ] ; 
endif

if input1(16) > 0
X = [ X accel_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(16) ) ] ; 
endif

if input1(17) > 0
X = [ X jerk_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(17) ) ] ; 
endif

% now the X features matrix has been formed, get its size
X_rows = size( X , 1 ) ; X_cols = size( X , 2 ) ;

X = [ ones( X_rows , 1 ) X ] ; % add bias unit to X

fan_in = X_cols + 1 ; % no. of inputs to a node/unit, including bias
fan_out = 1 ; % no. of outputs from node/unit
r = sqrt( 6 / ( fan_in + fan_out ) ) ; 

rolling_window_length = 100 ;
n_iters = 100 ;
n_iter_errors = zeros( n_iters , 1 ) ;
all_errors = zeros( X_rows - ( rolling_window_length - 1 ) - 1 , 1 ) ;
rolling_window_loop_iter = 0 ;

for rolling_window_loop = rolling_window_length : X_rows - 1 
 
  rolling_window_loop_iter = rolling_window_loop_iter + 1 ;
  
    % train n_iters no. of nets and put the error stats in n_iter_errors
    for ii = 1 : n_iters
      
      % initialise weights
      % see https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
      
      % One option is Orthogonal random matrix initialization for input_to_hidden weights
      % w_i = rand( X_cols + 1 , hidden_layer_size ) ;
      % [ u , s , v ] = svd( w_i ) ;
      % input_to_hidden = [ ones( X_rows , 1 ) X ] * u ; % adding bias unit to X
      
      % using fan_in and fan_out for tanh
      w_i = ( rand( X_cols + 1 , hidden_layer_size ) .* ( 2 * r ) ) .- r ;
      input_to_hidden = X( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , : ) * w_i ; 
      
      % push the input_to_hidden through the chosen sigmoid function
      hidden_layer_output = sigmoid_lecun_m( input_to_hidden ) ;
      
      % add bias unit for the output from hidden
      hidden_layer_output = [ ones( rolling_window_length , 1 ) hidden_layer_output ] ;
      
      % use hidden_layer_output as the input to a linear regression fit to targets Y
      % a la Extreme Learning Machine
      % w = ( inv( X' * X ) * X' ) * y ; the "classic" way for linear regression, where
      % X = hidden_layer_output, but
      w = ( ( hidden_layer_output' * hidden_layer_output ) \ hidden_layer_output' ) * Y( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , 1 ) ;
      % is quicker and recommended
      
      % use these current values of w_i and w for out of sample test
      os_input_to_hidden = X( rolling_window_loop + 1 , : ) * w_i ;
      os_hidden_layer_output = sigmoid_lecun_m( os_input_to_hidden ) ;
      os_hidden_layer_output = [ 1 os_hidden_layer_output ] ; % add bias
      os_output = os_hidden_layer_output * w ;
      n_iter_errors( n_iters ) = abs( Y( rolling_window_loop + 1 , 1 ) - os_output ) ;
      
    endfor
    
    all_errors( rolling_window_loop_iter ) = mean( n_iter_errors ) ;
    
endfor % rolling_window_loop

retval = mean( all_errors ) ;

clear X w_i ; 

endfunction

However, to speed things up for some rapid prototyping, rather than use backpropagation training this function uses the principles of an extreme learning machine and loops over 100 such trained ELMs per set of features contained in a rolling window of length 100 across the entire training data set. Walk forward cross validation is performed for each of the 100 ELMs, an average of the out of sample error obtained, and these averages across the whole data set are then averaged to provide the function return. The code was run on daily bars of the four major forex pairs; EURUSD, GBPUSD, USDCHF and USDYPY.

The results of running the above are quite interesting. The first surprise is that the currency strength indicator and features derived from it were not included in the optimal model for any of the four tested pairs. Secondly, for all pairs, a scaled version of a 20 bar price momentum function, and derived features, was included in the optimal model. Finally, again for all pairs, there was a symmetrically decreasing lookback period across the selected features, and when averaged across all pairs the following pattern results: 10 3 3 2 1 3 3 2 1, which is to be read as:

10 nodes (plus a bias node) in the hidden layer
lookback length of 3 for the scaled values of the SMA20 and the 20 bar scaled momentum function
lookback length of 3 for the slopes/rates of change of the above
lookback length of 2 for the "accelerations" of the above
lookback length of 1 for the "jerks" of the above

So it would seem that the 20 bar momentum function is a better exogenous input than the currency strength indicator. The symmetry across features is quite pleasing, and the selection of these "physical motion" features across all the tested pairs tends to confirm their validity. The fact that the currency strength indicator was not selected does not mean that this indicator is of no value, but perhaps it should not be used for regression purposes, but rather as a filter. More in due course.

Wednesday, 16 November 2011

Update on the Perfect Moving Average and Oscillator

It has been some time since my last post and the reason for this was, in large part, due to my changing over from an Ubuntu to a Kubuntu OS and the resultant teething problems etc. However, all that is over so now I return to the Perfect Moving Average (PMA) and the Perfect Oscillator (PO).

In my previous post I stated that I would show these on real world data, but before doing this there was one thing I wanted to investigate - the accuracy of the price projection algorithm upon which the average and oscillator calculations are based. Below are four sample screenshots of these investigations.

Sideways Market

Up Trending Market

Down Trending Market

The charts show a simulated "ideal" time series (cyan) with two projected price series of 10 points each (red and green) plotted such that the red and green projections are plotted concurrently with the actual series points they are projections, or predictions, of. Both projections/predictions start at the point at which the red "emerges" from the actual series. As can be seen the green projection/prediction is much more accurate than the red, and the green is the improved projection algorithm I have recently been working on.

As a result of this improvement and due to the nature of the internal calculations I expect that the PMA will follow prices much closer and that the PO will lead prices by a quarter cycle compared with my initial price projection algorithm. More on this in the next post.

Friday, 7 October 2011

The Theoretically Perfect Moving Average and Oscillator

Readers of this blog will probably have noticed that I am partial to using concepts from digital signal processing in the development of my trading system. Recently I "rediscovered" this PDF, written by Tim Tillson, on Google Docs, which has a great opening introduction:

" 'Digital filtering includes the process of smoothing, predicting, differentiating,

integrating, separation of signals, and removal of noise from a signal. Thus many people
who do such things are actually using digital filters without realizing that they are; being
unacquainted with the theory, they neither understand what they have done nor the
possibilities of what they might have done.'

This quote from R. W. Hamming applies to the vast majority of indicators in technical
analysis."

The purpose of this blog post is to outline my recent work in applying some of the principles discussed in the linked PDF.

Long time readers of this blog may remember that back in April this year I abandoned work I was doing on the AFIRMA trend line because I was dissatisfied with the results. The code for this required projecting prices forward using my leading signal code, and I now find that I can reuse the bulk of this code to create a theoretically perfect moving average and oscillator. I have projected prices forward by 10 bars and then taken the average of the forwards and backwards exponential moving averages, as per the linked PDF, to create a series of averages that theoretically are in phase with the underlying price, and then taken the mean of all these averages to produce my "perfect" moving average. Similarly, I have done the same to create a "perfect" oscillator and the results are shown in the images below.
Sideways Market

Trending up Market

Trending down Market

The upper panes show "price" and its perfect moving average, the middle panes show the perfect oscillator, its one bar leading signal, and exponential standard deviation bands set at 1 standard deviation above and below an exponential moving average of the oscillator. The lower panes show the %B indicator of the oscillator within the bands, offset so that the exponential standard deviation bands are at 1 and -1 levels.

Even cursory examination of many charts such as these shows the efficacy of the signals generated by the crossovers of price with its moving average, the oscillator with its leading signal and the %B indicator crossing the 1 and -1 levels, when applied to idealised data. My next post will discuss the application to real price series.

Pages