Showing posts with label Kalman. Show all posts
Showing posts with label Kalman. Show all posts

Monday, 30 December 2024

A "New" Way to Smooth Price

Below is code for an Octave compiled C++ .oct function to smooth price data.
#include "octave oct.h"
#include "octave dcolvector.h"
#include "octave dmatrix.h"
#include "GenFact.h"
#include "GramPoly.h"
#include "Weight.h"

DEFUN_DLD ( double_smooth_proj_2_5, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} double_smooth_proj_2_5 (@var{input_vector})\n\
This function takes an input series and smooths it by projecting a 5 bar rolling linear fit\n\
3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input\n\
to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e. a 0.5 bar\n\
lead over the last actual rolling point in the series. This is averaged with the previously\n\
calculated such smoothed point for a theoretical zero-lagged smooth. This smooth is\n\
again smoothed as above for a smooth of the smooth, i.e. a double-smooth of the\n\
original input series. The double_smooth and smooth are the function outputs.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin > 1 ) // there must be a single, input price vector
   {
   error ("Invalid arguments. Inputs are a single, input price vector.") ;
   return retval_list ;
   }

if ( args(0).length () < 5 )
   {
   error ("Invalid 1st argument length. Input is a price vector of length >= 5.") ;
   return retval_list ;
   }
// end of input checking

ColumnVector input = args(0).column_vector_value () ;
ColumnVector smooth = args(0).column_vector_value () ;
ColumnVector double_smooth = args(0).column_vector_value () ;

// create the fit coefficients matrix
int p = 5 ;             // the number of points in calculations
int m = ( p - 1 ) / 2 ; // value to be passed to call_GenPoly_routine_from_octfile
int n = 1 ;             // value to be passed to call_GenPoly_routine_from_octfile
int s = 0 ;             // value to be passed to call_GenPoly_routine_from_octfile

  // create matrix for fit coefficients
  Matrix fit_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        fit_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  // create matrix for slope coefficients
  Matrix slope_coefficients_matrix ( 2 * m + 1 , 2 * m + 1 ) ;
  s = 1 ;
  // and assign values in loop using the Weight.h recursive Gram Polynomial C++ headers
  for ( int tt = -m ; tt < (m+1) ; tt ++ )
  {
        for ( int ii = -m ; ii < (m+1) ; ii ++ )
        {
        slope_coefficients_matrix ( ii + m , tt + m ) = Weight( ii , tt , m , n , s ) ;
        }
  }

  Matrix smooth_coefficients ( 1 , 5 ) ;
  // fill the smooth_coefficients matrix, smooth_coeffs = ( 9/24 ) .* fit_coeffs + ( 7/12 ) .* slope_coeffs + [ 0 ; 1/24 ; 1/8 ; 5/24 ; 1/4 ] ;
  smooth_coefficients( 0 , 0 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 0 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 0 , 4 ) ;
  smooth_coefficients( 0 , 1 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 1 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 1 , 4 ) + ( 1.0 / 24.0 ) ;
  smooth_coefficients( 0 , 2 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 2 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 2 , 4 ) + ( 1.0 / 8.0 ) ;
  smooth_coefficients( 0 , 3 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 3 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 3 , 4 ) + ( 5.0 / 24.0 ) ;
  smooth_coefficients( 0 , 4 ) = ( 9.0 / 24.0 ) * fit_coefficients_matrix( 4 , 4 ) + ( 7.0 / 12.0 ) * slope_coefficients_matrix( 4 , 4 ) + ( 1.0 / 4.0 ) ;

    for ( octave_idx_type ii (4) ; ii < args(0).length () ; ii++ )
        {

         smooth( ii ) = smooth_coefficients( 0 , 0 ) * input( ii - 4 ) + smooth_coefficients( 0 , 1 ) * input( ii - 3 ) + smooth_coefficients( 0 , 2 ) * input( ii - 2 ) +
                        smooth_coefficients( 0 , 3 ) * input( ii - 1 ) + smooth_coefficients( 0 , 4 ) * input( ii ) ;

         double_smooth( ii ) = smooth_coefficients( 0 , 0 ) * smooth( ii - 4 ) + smooth_coefficients( 0 , 1 ) * smooth( ii - 3 ) + smooth_coefficients( 0 , 2 ) * smooth( ii - 2 ) +
                               smooth_coefficients( 0 , 3 ) * smooth( ii - 1 ) + smooth_coefficients( 0 , 4 ) * smooth( ii ) ;

        }

retval_list( 0 ) = double_smooth ;
retval_list( 1 ) = smooth ;

return retval_list ;

} // end of function

Rather than describe it, I'll just paste the "help" description below:-

"This function takes an input series and smooths it by projecting a 5 bar rolling linear fit 3 bars into the future and using these 3 bars plus the last 3 bars of the rolling input to fit a FIR filter with a 2.5 bar lag from the last projected point, i.e.  a 0.5 bar lead over the last actual rolling point in the series.  This is averaged with the previously calculated such smoothed point for a theoretical zero-lagged smooth.  This smooth is again smoothed as above for a smooth of the smooth, i.e.  a double-smooth of the original input series.  The double_smooth and smooth are the function outputs."

The above function calls previous code of mine to calculate savitzky golay filter convolution coefficients for internal calculations, but for the benefit of readers I will simply post the final coefficients for a 5 tap FIR filter.

-0.191667  -0.016667   0.200000   0.416667   0.591667

These coefficients are for a "single" smooth. Run the output from a function using these through the same function to get the "double" smooth.

Testing on a sinewave looks like this:-

where the cyan, green and blue filters are the original FIR filter with 2.5 bar lag, a 5 bar SMA and Ehler's super smooth (see code here) applied to sinewave "price" for comparative purposes. The red filter is my "double_smooth" and the magenta is a Jurik Moving Average using an Octave adaptation of code that is/was freely available on the Tradingview website. The parameters for this Jurik average (length, phase and power) were chosen to match, as closely as possible, those of the double_smooth for an apples to apples comparison.

I will not discuss this any further in this post other than to say I intend to combine this with the ideas contained in my new use for kalman filter post.

More in due course.

Saturday, 11 May 2024

End of Initial Tests of Trading Forex News Announcements

Following on from my previous post I have completed the same tests as outlined in that post on other currencies and the summary results are:

  • USD - an average of 0.38% return per trade
  • EUR - an average of 0.22% return per trade
  • GBP - an average of 0.77% return per trade
  • CHF - an average of 2.05% return per trade
  • JPY - an average of 0.18% return per trade
  • AUD - an average of 1.01% return per trade

Since these seem to be profitable across the board I deem it is worthwhile to continue investigating some sort of news trading system. However, that said, there is a huge caveat regarding fills which has recently been pointed out by Terberh Strategy in a comment to the previous post, namely the withdrawal of liquidity immediately prior to these news releases. I am not yet sure how I can account for this in future testing, and it may be that this lack of liquidity could in fact make it almost impossible to accurately test any such system without making some consequential assumptions.
 

Tuesday, 9 April 2024

A "New" Use for Kalman Filter on Price Time Series?

During the course of writing this blog I have visited the idea of using Kalman filters several times, most recently in this February 2023 post. My motivation in these previous posts could best be described as trying to smooth price data with as little lag as possible, i.e. create a zero-lag indicator. In doing so, the model most often used for the Kalman filter was a physical motion model with position, velocity and acceleration components. Whilst these "worked" in the sense of smoothing the underlying data, it is not necessarily a good model to use on financial data because, obviously, financial data is not a physical system and so I thought I would apply the Kalman filter to something that is ubiquitously used on financial data - the Exponential moving average.

Below is an Octave function to calculate a "Kalman_ema" where the prediction part of the filter is just a linear extrapolation of an exponential moving average and then this extrapolated value is used to calculate the projected price, with the measurement being the real price and its ema value.

## Copyright (C) 2024 dekalog
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} kalman_ema (@var{price}, @var{lookback})
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2024-04-08

function [ filter_out , P_out ] = kalman_ema ( price , lookback )

## check price is row vector
if ( size( price , 1 ) > 1 && size( price , 2 ) == 1 )
 price = price' ;
endif

P_out = zeros( numel( price ) , 2 ) ;

ema_price = ema( price , lookback )' ;
alpha = 2 / ( lookback + 1 ) ;

## initial covariance matrix
P = eye( 3 ) ;

## transistion matrix
A = [ 0 , ( 1 + alpha ) / alpha , -( 1 / alpha ) ; ...
       0 , 2 , -1 ; ...
       0 , 1 , 0 ] ;

## initial Q
Q = eye( 3 ) ;

## measurement vector
Y = [ price ; ...
       ema_price ; ...
       shift( ema_price , 1 ) ] ;
Y( 3 , 1 ) = Y( 2 , 1 ) ;

## measurement matrix
H = eye( 3 ) ;

## measurement noise covariance
R = eye( 3 ) ;

## container for Kalman filter output
filter_out = zeros( size( Y ) ) ;
errors = ones( 3 , 1 ) ;

for ii = 2 : size( Y , 2 )

  X = A * filter_out( : , ii - 1 ) ;
  P = A * P * A' + Q ;

  errors = alpha .* abs( X - Y( : , ii ) ) + ( 1 - alpha ) .* errors ;

  IM = H * X ; ## Mean of predictive distribution
  IS = ( R + H * P * H' ) ; ## Covariance of predictive mean
  K = P * H' / IS ; ## Computed Kalman gain
  X = X + K * ( Y( : , ii ) - IM ) ; ## Updated state mean
  P = P - K * IS * K' ; ## Updated state covariance

  filter_out( : , ii ) = X ;
  P_out( ii , 1 ) = P( 1 , 1 ) ;
  P_out( ii , 2 ) = P( 2 , 2 ) ;

  ## update Q and R
  Q( 1 , 1 ) = errors( 1 ) ; Q( 2 , 2 ) = errors( 2 ) ; Q( 3 , 3 ) = errors( 3 ) ;
  R = Q ;

endfor

filter_out = filter_out' ;

endfunction
Using this for smoothing either the price or the ema has no utility, but a by-product of the filter is the Covariance matrix from which it is possible to plot bands around the price series. The following chart shows the bands for various ema alpha values corresponding to various Fibonacci sequence length look backs for the ema alpha value.  
This next chart, for purposes of clarity, shows the "Golden Cross" lengths of 50 and 200
and this final chart shows an adaptive look back length which is a function of the instantaneous measured period (see here or here) of the underlying data.

Despite the wide range of the input look back lengths for the ema, it can be seen that the covariance bands around price are broadly similar. I can think of many uses for such a data driven but basically parameter insensitive measure of price variance, e.g. entry/exit levels, stop levels, position sizing etc.

More in due course.


Tuesday, 30 May 2023

Quick Update on Kalman Filter and Sensor Fusion

Managed to code it up and get it working, but at the end of the day I couldn't see any value added over just averaging the output of the indicators I was trying to fuse together via Kalman filtering. As a result, I'm giving up on this for now and looking at other things.

More in due course. 

Tuesday, 28 February 2023

Kalman Filter and Sensor Fusion.

In the Spring of 2012 and again in the Spring of 2019 I posted a series of posts about the Kalman Filter, which readers can access via the blog archive on the right. In both cases I eventually gave up those particular lines of investigation because of disappointing results. This post is the first in a new series about using the Kalman Filter for sensor fusion, which I had known of before, but due to the paucity of clear information about this online I had never really investigated. However, my recent discovery of this Github and its associated online tutorial has inspired me to a third attempt at using Kalman Filters. What I am going to attempt to do is use the idea of sensor fusion to fuse the output of several functions I have coded in the past, which each extract the dominant cycle from a time series, to hopefully obtain a better representation of the "true underlying cycle."

The first step in this process is to determine the measurement noise covariance or, in Kalman Filter terms, the "R" covariance matrix. To do this, I have used the average of two of the outputs from the above mentioned functions to create a new cycle and similarly used two extracted trends (price minus these cycles) averaged to get a new trend. The new cycle and new trend are simply added to each other to create a new price series which is almost identical to the original price series. The screenshot below shows a typical cycle extract,

where the red cycle is the average of the other two extracted cycles, and this following screenshot shows the new trend in red plus the new price alongside the old price (blue and black respectively).
Having created a time series thus with known trend and cycle, it is a simple matter to run my cycle extractor functions on this new price, compare the outputs with the known cyclic component of price and calculate the variance of the errors to get the R covariance matrices for 14 different currency crosses.

More in due course.

 

Friday, 21 June 2019

Process Noise Covariance Matrix Q for a Kalman Filter

Since my last post I have been working on the process noise covariance matrix Q, with a view to optimising both the Q and R matrices for an Extended Kalman filter to model the cyclic component of price action as a Sine wave. However, my work to date has produced unsatisfactory results and I have decided to give up trying to make it work.

The reasons for this failure are unclear to me, and I don't intend to spend any more time investigating, but some educated guesses are that the underlying model of sinusoid is mismatched and my estimation of the process and/or measurement noise variances is lacking; either way, the end result is that the EKF is diverging and my earlier leading signal 1, leading signal 2 and leading signal 3 posts outline what I think will be a more promising line of investigation in the future.

Nevertheless, below I provide the rough working code that I have been using in my EKF work, and maybe readers will find something of value in it. There is a lot of commenting and some blocked out code and I'm afraid readers will have to wade through this as best they can.
clear all ;
1 ;

pkg load signal ;

% function declarations
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Y = ekf_sine_h( x )
% Measurement model function for the sine signal.
% Takes the state input vector x of sine, phase, angular frequency and amplitude and 
% calculates the current value of the sine given the state vector values.

f = x( 2 , : ) ;    % phase value in radians
a = x( 4 , : ) ;    % amplitude
Y = a .* sin( f ) ; % the sine value

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ cost ] = sine_ekf_Q_optim( Q_mult , data , targets , Q , R )
  
% initial state from noisy data input measurements
s = data( : , 1 ) ;

n = size( data , 1 ) ;

% initial state covariance 
P = eye( n ) ;

% allocate memory
xV = zeros( size( data ) ) ;  % record estmates of states

% basic Jacobian of state transition matrix 
A = [ 0 0 0 0 ;   % sine value
      0 1 1 0 ;   % phase
      0 0 1 0 ;   % angular frequency
      0 0 0 1 ] ; % amplitude
      
H = eye( n ) ; % measurement matrix

Q = Q_mult .* Q ;

  for k = 2 : size( data , 2 )

  % do ekf
  % nonlinear update x and linearisation at current state s
  x = s ;
  x( 2 ) = x( 2 ) + x( 3 ) ;        % advance phase value by angular frequency
  x( 2 ) = mod( x( 2 ) , 2 * pi ) ; % limit phase values to range 0 --- 2 * pi
  x( 1 ) = x( 4 ) * sin( x( 2 ) ) ; % sine value calculation

  % update the 1st row of the jacobian matrix at state vector s values
  A( 1 , : ) = [ 0 s( 4 ) * cos( s( 2 ) ) 0 sin( s( 2 ) ) ] ;

  P = A * P * A' + Q ; % state transition model update of covariance matrix P

  measurement_residual = [ data( 1 , k ) - x( 1 ) ;   % sine residual
                           data( 2 , k ) - x( 2 ) ;   % phase residual
                           data( 3 , k ) - x( 3 ) ;   % angular frequency residual
                           data( 4 , k ) - x( 4 ) ] ; % amplitude residual
                           
  innovation_residual_covariance = H * P * H' + R ;
  kalman_gain = P * H' / innovation_residual_covariance ;

  % update the state vector s with kalman_gain 
  s = x + kalman_gain * measurement_residual ;

  % some reality based post hoc adjustments
  s( 2 ) = abs( s( 2 ) ) ; % no negative phase values
  s( 3 ) = abs( s( 3 ) ) ; % no negative angular frequencies
  s( 4 ) = abs( s( 4 ) ) ; % no negative amplitudes

  % update the state covariance matrix P
  % NOTE
  % The Joseph formula is given by P+ = ( I − KH ) P− ( I − KH )' + KRK', where I is the identity matrix,
  % K is the gain, H is the measurement mapping matrix, R is the measurement noise covariance matrix, 
  % and P−, P+ are the pre and post measurement update estimation error covariance matrices, respectively.  
  % The optimal linear unbiased estimator (equivalently the optimal linear minimum mean square error estimator)  
  % or Kalman filter often utilizes simplified covariance update equations such as P+ = (I−KH)P− and P+ = P− −K(HP−H'+R)K'.  
  % While these alternative formulations require fewer computations than the Joseph formula, they are only valid 
  % when K is chosen as the optimal Kalman gain. In engineering applications, situations arise where the optimal 
  % Kalman gain is not utilized and the Joseph formula must be employed to update the estimation error covariance.  
  % Two examples of such a scenario are underweighting measurements and considering states. 
  % Even when the optimal gain is used, the Joseph formulation is still preferable because it possesses 
  % greater numerical accuracy than the simplified equations.
  P = ( eye( n ) - kalman_gain * H ) * P * ( eye( n ) - kalman_gain * H )' + kalman_gain * R * kalman_gain' ;
   
  xV( : , k ) = s ; % save estimated updated states
   
  endfor
  
Y = ekf_sine_h( xV ) ;
cost = mean( ( Y( 4 : end - 3 ) .- targets( 4 : end - 3 ) ).^2 ) ;
%cost = mean( ( Y .- targets ).^2 ) ;
%output = xV( 1 , : ) ;

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function output = run_sine_ekf_Q_optim( data , Q , R )
  
% initial state from noisy data input measurements
s = data( : , 1 ) ;

n = size( data , 1 ) ;

% initial state covariance 
P = eye( n ) ;

% allocate memory
xV = zeros( size( data ) ) ;  % record estmates of states

% basic Jacobian of state transition matrix 
A = [ 0 0 0 0 ;   % sine value
      0 1 1 0 ;   % phase
      0 0 1 0 ;   % angular frequency
      0 0 0 1 ] ; % amplitude
      
H = eye( n ) ; % measurement matrix

  for k = 2 : size( data , 2 )

  % do ekf
  % nonlinear update x and linearisation at current state s
  x = s ;
  x( 2 ) = x( 2 ) + x( 3 ) ;        % advance phase value by angular frequency
  x( 2 ) = mod( x( 2 ) , 2 * pi ) ; % limit phase values to range 0 --- 2 * pi
  x( 1 ) = x( 4 ) * sin( x( 2 ) ) ; % sine value calculation

  % update the 1st row of the jacobian matrix at state vector s values
  A( 1 , : ) = [ 0 s( 4 ) * cos( s( 2 ) ) 0 sin( s( 2 ) ) ] ;

  P = A * P * A' + Q ; % state transition model update of covariance matrix P

  measurement_residual = [ data( 1 , k ) - x( 1 ) ;   % sine residual
                           data( 2 , k ) - x( 2 ) ;   % phase residual
                           data( 3 , k ) - x( 3 ) ;   % angular frequency residual
                           data( 4 , k ) - x( 4 ) ] ; % amplitude residual
                           
  innovation_residual_covariance = H * P * H' + R ;
  kalman_gain = P * H' / innovation_residual_covariance ;

  % update the state vector s with kalman_gain 
  s = x + kalman_gain * measurement_residual ;

  % some reality based post hoc adjustments
  s( 2 ) = abs( s( 2 ) ) ; % no negative phase values
  s( 3 ) = abs( s( 3 ) ) ; % no negative angular frequencies
  s( 4 ) = abs( s( 4 ) ) ; % no negative amplitudes

  % update the state covariance matrix P
  % NOTE
  % The Joseph formula is given by P+ = ( I − KH ) P− ( I − KH )' + KRK', where I is the identity matrix,
  % K is the gain, H is the measurement mapping matrix, R is the measurement noise covariance matrix, 
  % and P−, P+ are the pre and post measurement update estimation error covariance matrices, respectively.  
  % The optimal linear unbiased estimator (equivalently the optimal linear minimum mean square error estimator)  
  % or Kalman filter often utilizes simplified covariance update equations such as P+ = (I−KH)P− and P+ = P− −K(HP−H'+R)K'.  
  % While these alternative formulations require fewer computations than the Joseph formula, they are only valid 
  % when K is chosen as the optimal Kalman gain. In engineering applications, situations arise where the optimal 
  % Kalman gain is not utilized and the Joseph formula must be employed to update the estimation error covariance.  
  % Two examples of such a scenario are underweighting measurements and considering states. 
  % Even when the optimal gain is used, the Joseph formulation is still preferable because it possesses 
  % greater numerical accuracy than the simplified equations.
  P = ( eye( n ) - kalman_gain * H ) * P * ( eye( n ) - kalman_gain * H )' + kalman_gain * R * kalman_gain' ;
   
  xV( : , k ) = s ; % save estimated updated states
   
  endfor 

output = xV ;

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data
cd /home/dekalog/Documents/octave/indices ;

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete date vector
% so now
%    1       2       3       4       5       6       7       8      9      10      11      12      13     14
% aud_cad aud_chf aud_jpy aud_nzd aud_sgd aud_usd cad_chf cad_jpy cad_sgd chf_jpy eur_aud eur_cad eur_chf eur_gbp
%
%   15      16      17      18      19      20      21      22      23      24      25      26      27      28
% eur_jpy eur_nzd eur_sgd eur_usd gbp_aud gbp_cad gbp_chf gbp_jpy gbp_nzd gbp_sgd gbp_usd nzd_cad nzd_chf nzd_jpy
%
%    29      30      31      32      33      34      35      36      37      38     39       40      41      42       
% nzd_sgd nzd_usd sgd_chf sgd_jpy usd_cad usd_chf usd_jpy usd_sgd xag_aud xag_cad xag_chf xag_eur xag_gbp xag_jpy
%
%    43      44     45      46      47      48      49      50      51      52     53       54      55
% xag_nzd xag_sgd xag_usd xau_aud xau_cad xau_chf xau_eur xau_gbp xau_jpy xau_nzd xau_sgd xau_usd xau_xag 

% aud_x = x(1) ; cad_x = x(2) ; chf_x = x(3) ; eur_x = x(4) ; gbp_x = x(5) ; hkd_x = x(6) ;
% jpy_x = x(7) ; nzd_x = x(8) ; sgd_x = x(9) ; usd_x = x(10) ; gold_x = x(11) ; silver_x = x(12) ;
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
% so now index ix are
% aud_x = 56 ; cad_x = 57 ; chf_x = 58 ; eur_x = 59 ; gbp_x = 60 ; jpy_x = 61 ; 
% nzd_x = 62 ; sgd_x = 63 ; usd_x = 64 ; gold_x = 65 ; silver_x = 66 ;

all_raw_data = [ raw_data all_g_c ] ; clear -x all_raw_data ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

cd /home/dekalog/Documents/octave/kalman/ekf ;

##  >> bs_phase_error = mean( all_bootstrap_means(:,1)) + 2 * std( all_bootstrap_means(:,1))
##  bs_phase_error =  0.50319
##  >> bs_period_ang_frequency_error = mean( all_bootstrap_means(:,2)) + 2 * std( all_bootstrap_means(:,2))
##  bs_period_ang_frequency_error =  0.17097
##  >> bs_amp_error = mean( all_bootstrap_means(:,3)) + 2 * std( all_bootstrap_means(:,3))
##  bs_amp_error =  0.18179

price_ix = 65 ;
price = all_raw_data( : , price_ix ) ; % plot(price) ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% get measurements of price action
period = autocorrelation_periodogram( price ) ;
measured_angular_frequency = ( 2 * pi ) ./ period ; % plot( measured_angular_frequency ) ;
trend = sma( price , period ) ;
cycle = price .- trend ;
%smooth_cycle = sgolayfilt( cycle , 2 , 7 ) ;
smooth_cycle = smooth_2_5( cycle ) ;
% plot( price,'k',trend,'r') ;
[ ~ , ~ , ~ , ~ , ~ , ~ , measured_phase ] = sinewave_indicator( cycle ) ; % figure(1) ; plot( deg2rad( measured_phase ) ) ;
% figure(2) ; plot( sin( deg2rad(measured_phase)) ) ;
measured_phase = mod( unwrap( deg2rad( measured_phase ) ) , 2  * pi ) ; % figure(1) ; hold on ; plot( measured_phase , 'r' ) ; hold off ;
% figure(2) ; hold on ; plot( sin( measured_phase)) ; hold off ;

measured_amplitude = cycle ;
for ii = 50 : length( cycle ) ;
measured_amplitude( ii ) = sqrt( 2 ) * sqrt( mean( cycle( ii - period( ii ) : ii ).^2 ) ) ; 
endfor % end ii loop
% plot(measured_amplitude) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% run the ekf optimisation
ekf_cycle = cycle ;

% complete values for R matrix for this data set
data = [ smooth_cycle( 101 : end ) measured_phase( 101 : end ) measured_angular_frequency( 101 : end ) measured_amplitude( 101 : end ) ]' ; 
%measured_sine = sgolayfilt( data( 1 , : ) , 2 , 7 ) ; % figure(1) ; plot( data( 1 , : ) , 'k' , measured_sine , 'r' ) ;
measured_sine = smooth_cycle( 101 : end )' ;
R( 1 , 1 ) = mean( ( data( 1 , : ) .- measured_sine ).^2 ) ; % variance of sine wave value measurements
R( 3 , 3 ) = mean( ( data( 3 , : ) .* 0.17 ).^2 ) ;          % variance of angular frequency measurements
R( 4 , 4 ) = mean( ( data( 4 , : ) .* 0.18 ).^2 ) ;          % variance of amplitude measurements

% get initial process covariance matrix Q
Q = analytical_shrinkage( data' ) ;
Q = 0.1 .* Q ;

lookback = 200 ;

%for ii = 301 : length( price )
  
% complete values for R matrix for this data set
data = [ smooth_cycle(ii-lookback:ii) measured_phase(ii-lookback:ii) measured_angular_frequency(ii-lookback:ii) measured_amplitude(ii-lookback:ii) ]' ; 
%measured_sine = sgolayfilt( data( 1 , : ) , 2 , 7 ) ; % figure(1) ; plot( data( 1 , : ) , 'k' , measured_sine , 'r' ) ;
measured_sine = cycle( ii - lookback : ii )' ;
R( 1 , 1 ) = mean( ( data( 1 , : ) .- measured_sine ).^2 ) ; % variance of sine wave value measurements
R( 3 , 3 ) = mean( ( data( 3 , : ) .* 0.17 ).^2 ) ;          % variance of angular frequency measurements
R( 4 , 4 ) = mean( ( data( 4 , : ) .* 0.18 ).^2 ) ;          % variance of amplitude measurements
  
%  Q = analytical_shrinkage( data' ) ;
   
% optimise the Q matrix for this data set
% declare optimisation function
f = @( Q_mult ) sine_ekf_Q_optim( Q_mult , data , smooth_cycle( ii - lookback : ii )' , Q , R ) ;
% Set options for fminunc
options = optimset( "MaxIter" , 50 ) ;
% initial value
Q_mult = 1 ;
Q_mult = fminunc( f , Q_mult , options ) ;
% adjust Q by optimised Q_mult
Q = Q_mult .* Q ;

  output = run_sine_ekf_Q_optim( data , Q , R ) ;
  %ekf_cycle( ii ) = ekf_sine_h( output( : , end ) ) ;
  ekf_cycle = ekf_sine_h( output ) ;
  
figure(1) ; subplot( 4 , 1 , 1 ) ; plot( data( 1 , : ) , 'k' , 'linewidth' , 2 , ekf_sine_h( output ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 2 ) ; plot( data( 2 , : ) , 'k' , 'linewidth' , 2 , output( 2 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Phase' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 3 ) ; plot( data( 3 , : ) , 'k' , 'linewidth' , 2 , output( 3 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Angular Frequency' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 4 ) ; plot( data( 4 , : ) , 'k' , 'linewidth' , 2 , output( 4 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Amplitude' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
  
%endfor

figure(1) ; plot( cycle(101:end)' , 'k' , 'linewidth' , 2 , ekf_cycle , 'r' , 'linewidth' , 2 , output(1,:) , 'b' ) ;
figure(2) ; plot( price(101:end) , 'k' , 'linewidth' , 2 , trend(101:end).+ekf_cycle' , 'r' , 'linewidth' , 2 ) ;
One thing I will point out is the use of a function called analytical_shrinkage, which I have taken directly from a recent paper, Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices, the MATLAB code being provided as an appendix in the paper. Readers may find this to be particularly useful outside the use I have tried putting it to.

Next week I shall go on my customary, summer working holiday and be away from home until August: therefore there will be a hiatus in blog posts until my return.

Thursday, 6 June 2019

Determining the Noise Covariance Matrix R for a Kalman Filter

An important part of getting a Kalman filter to work well is tuning the process noise covariance matrix Q and the measurement noise covariance matrix R. This post is about obtaining the R matrix, with a post about the Q matrix to come in due course.

In my last post about the alternative version extended kalman filter to model a sine wave I explained that the 4 measurements are the sine wave value itself, and the phase, angular frequency and amplitude of the sine wave. The phase and angular frequencies are "measured" using outputs of the sinewave indicator and the amplitude is calculated as an RMS value over a measured period. The "problem" with this is that the accuracies of such measurements on real data are unknown, and therefore so is the R matrix, because the underlying cycle properties are unknown. My solution to this is offline Monte Carlo simulation.

As a first step the Octave code in the box immediately below
clear all ;
pkg load statistics ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete data vector 

all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index

all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector

all_raw_data = [ raw_data all_g_c ] ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% set up recording vectors
amplitude = zeros( size( all_raw_data , 1 ) , 1 ) ;
all_amplitudes = zeros( size( all_raw_data ) ) ;
all_periods = all_amplitudes ;

for ii = 1 : size( all_raw_data , 2 )
  
  price = all_raw_data( : , ii ) ;
  period = autocorrelation_periodogram( price ) ;
  all_periods( : , ii ) = period ;
  [ hp , ~ ] = highpass_filter_basic( price ) ; hp( 1 : 99 ) = 0 ;

  for jj = 100 : size( all_raw_data , 1 )
    amplitude( jj ) = sqrt( 2 ) * sqrt( mean( hp( round( jj - period( jj ) / 2 ) : jj , 1 ).^2 ) ) ;
  endfor
  
  % store information about amplitude changes as a multiplicative constant
  all_amplitudes( : , ii ) = amplitude ./ shift( amplitude , 1 ) ;

endfor

% truncate
all_amplitudes( 1 : 101 , : ) = [] ;
all_periods( 1 : 101 , : ) = [] ;

max_period = max( max( all_periods ) ) ;
min_period = min( min( all_periods ) ) ;

% unroll
all_amplitudes = all_amplitudes( : ) ; all_periods = all_periods( : ) ;

amplitude_changes = cell( max_period , 1 ) ;

for ii = min_period : max_period
  ix = find( all_periods == ii ) ;
  amplitude_changes{ ii , 1 } = all_amplitudes( ix , 1 ) ;
endfor

save all_amplitude_changes amplitude_changes ;
is used to get information about cycle amplitudes from real data and store it in a cell array.

This next box shows code that uses this cell array, plus earlier hidden markov modelling code for sine wave modelling, to produce synthetic price data
clear all ;
pkg load signal ;
pkg load statistics ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete date vector
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
all_raw_data = [ raw_data all_g_c ] ; clear -x all_raw_data ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

load all_hmm_periods_daily ;
load all_amplitude_changes ;

% choose an ix for all_raw_data
ix = input( "ix? " ) ;
price = all_raw_data( : , ix ) ;
period = autocorrelation_periodogram( price ) ;
[ hp , trend ] = highpass_filter_basic( price ) ;

% estimate variance of cycle noise
hp_filt = sgolayfilt( hp , 2 , 7 ) ;
noise = hp( 101 : end ) .- hp_filt( 101 : end ) ; noise_var = var( noise ) ;
% zero pad beginning of noise vector
noise = [ zeros( 100 , 1 ) ; noise ] ;

% set for plotting purposes
hp( 1 : 50 ) = 0 ; trend( 1 : 50 ) = price( 1 : 50 ) ;

% create sine wave with known phase and period/angular frequency 
[ gen_period , ~ ] = hmmgenerate( length( price ) , transprobest , outprobest ) ;
gen_period = gen_period .+ hmm_min_period_add ;
phase = cumsum( ( 2 * pi ) ./ gen_period ) ; phase = mod( phase , 2 * pi ) ;
gen_sine = sin( phase ) ;

% amplitude vector
amp = ones( size( hp ) ) ;

% RMS initial amplitude 
amp( 100 ) = sqrt( 2 ) * sqrt( mean( hp( 100 - period( 100 ) : 100 ).^2 ) ) ;

% alter sine wave with known amplitude
for ii = 101 : length( hp )
  % get multiple for this period
  random_amp_mults = amplitude_changes{ gen_period( ii ) } ;
  rand_mult = random_amp_mults( randi( size( random_amp_mults , 1 ) , 1 ) ) ;
  amp( ii ) = rand_mult * sqrt( 2 ) * sqrt( mean( hp( ii - 1 - period( ii - 1 ) : ii - 1 ).^2 ) ) ; ;
  gen_sine( ii ) = amp( ii ) * gen_sine( ii ) ; 
endfor

% estimate variance of generated cycle noise
gen_filt = sgolayfilt( gen_sine , 2 , 7 ) ;
gen_noise = gen_sine( 101 : end ) .- gen_filt( 101 : end ) ; gen_noise_var = var( gen_noise ) ;

% a crude adjustment of generated cycle noise to match "real" noise by adding
% noise
gen_sine = gen_sine' .+ noise ;

% set beginning of gen_sine to real cycle for plotting purposes
gen_sine( 1 : 100 ) = hp( 1 : 100 ) ;
new_price = trend .+ gen_sine ;

% a simple correlation check
correlation_cycle_gen_sine = corr( hp , gen_sine ) ;
correlation_price_gen_price = corr( price , new_price ) ;

figure(1) ; plot( gen_sine , 'k' , 'linewidth' , 2 ) ; grid minor on ; title( 'The Generated Cycle' ) ; 
figure(2) ; plot( price , 'b' , 'linewidth' , 2 , trend , 'k' , 'linewidth' , 1 , new_price , 'r' , 'linewidth', 2 ) ; 
title( 'Comparison of Real and Generated Prices' ) ; legend( 'Real Price' , 'Real Trend' , 'Generated Price' ) ; grid minor on ;
figure(3) ; plot( hp , 'k' , 'linewidth' , 2 , gen_sine , 'r' , 'linewidth' , 2 ) ; title( 'Generated Cycle compared with Real Cycle' ) ; 
legend( 'Real Cycle' , 'Generated Cycle' ) ; grid minor on ;
Some typical chart output of this code is the synthetic price series in red compared with the real blue price series
and the underlying cycles.
For this particular run the correlation between the synthetic price series and the real price series is 0.99331, whilst the correlation between the relevant cycles is only 0.19988, over 1500 data points ( not all are shown in the above charts. )

Because the synthetic cycle is definitely a sine wave ( by construction ) with a known phase, angular frequency and amplitude I can run the above sinewave_indicator and RMS measurements on the synthetic prices, compare with the known synthetic cycle, and estimate the measurement noise covariance matrix R.

More in due course. 

Friday, 31 May 2019

Extended Kalman Filter, Alternative Version

Below is alternative code for an Extended Kalman filter for a sine wave, which has 4 states: the sine wave value, the phase, the angular frequency and amplitude and measurements thereof. I have found it necessary to implement this version because I couldn't adjust my earlier version code to accept and measure the additional states without the Cholesky decomposition function chol() exiting and giving errors about the input not being a positive definite matrix.
clear all ;
1 ;

function Y = ekf_sine_h( x )
% Measurement model function for the sine signal.
% Takes the state input vector x of sine, phase, angular frequency and amplitude and 
% calculates the current value of the sine given the state vector values.

f = x( 2 , : ) ;    % phase value in radians
a = x( 4 , : ) ;    % amplitude
Y = a .* sin( f ) ; % the sine value

endfunction

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
pkg load statistics ;
load all_snr ;
load all_hmm_periods_daily ;

N = 250 ; % total dynamic steps i.e.length of data to generate
[ gen_period , gen_states ] = hmmgenerate( N , transprobest , outprobest ) ;
gen_period = gen_period .+ hmm_min_period_add ;
angular_freq_rads = ( 2 * pi ) ./ gen_period ;
real_phase = cumsum( angular_freq_rads ) ; real_phase = mod( real_phase , 2 * pi ) ;
gen_sine = sin( real_phase ) ;

noise_val = mean( [ all_snr(:,1) ; all_snr(:,2) ] ) ;
my_sine_noisy = awgn( gen_sine , noise_val ) ;

[ ~ , ~ , ~ , ~ , ~ , ~ , phase ] = sinewave_indicator( my_sine_noisy ) ;
phase = deg2rad( phase' ) ;
period = autocorrelation_periodogram( my_sine_noisy ) ;
measured_angular_frequency = ( 2 * pi ) ./ period' ;
measured_angular_frequency( 1 : 50 ) = 2 * pi / 20 ;

% for for amplitude
measured_amplitude = ones( 1 , N ) ;
for ii = 50 : N
  measured_amplitude( ii ) = sqrt( 2 ) * sqrt( mean( my_sine_noisy( ii - period(ii) : ii ).^2 ) ) ;  
endfor

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
h = @ekf_sine_h ; 

n = 4 ; % number of states - sine value, phase, angular frequency and amplitude

% matrix for covariance of process
Q = eye( n ) ; 
Q( 1 , 1 ) = var( my_sine_noisy ) ;
Q( 2 , 2 ) = var( phase( 50 : end ) ) ;
Q( 3 , 3 ) = var( measured_angular_frequency( 50 : end ) ) ;
Q( 4 , 4 ) = var( measured_amplitude( 50 : end ) ) ;
% scale the covariance matrix, a tunable factor?
Q = 0.001 .* Q ; 

% measurement noise variance matrix
R = zeros( 4 , 4 ) ; 
R( 1 , 1 ) = var( gen_sine .- my_sine_noisy ) ;                                               % the sine values
R( 2 , 2 ) = var( real_phase( 50 : end ) .- phase( 50 : end ) ) ;                             % the phase values
R( 3 , 3 ) = var( angular_freq_rads( 50 : end ) .- measured_angular_frequency( 50 : end ) ) ; % angular frequency values
R( 4 , 4 ) = var( measured_amplitude( 50 : end ) .- 1 ) ;                                     % amplitude values

% initial state from noisy measurements
s = [ my_sine_noisy( 50 ) ;              % sine
      phase( 50 ) ;                      % phase ( Rads )
      measured_angular_frequency( 50 ) ; % angular_freq_rads
      measured_amplitude( 50 ) ] ;       % amplitude               
 
% initial state covariance 
P = eye( n ) ; 
 
% allocate memory
xV = zeros( n , N ) ;  % record estmates of states        
pV1 = zeros( n , N ) ; % projection measurments
pV2 = zeros( n , N ) ; % projection measurments
pV3 = zeros( n , N ) ; % projection measurments

% basic Jacobian of state transition matrix 
A = [ 0 0 0 0 ;   % sine value
      0 1 1 0 ;   % phase
      0 0 1 0 ;   % angular frequency
      0 0 0 1 ] ; % amplitude
      
H = eye( n ) ; % measurement matrix
      
for k = 51 : N

% do ekf
% nonlinear update x and linearisation at current state s
x = s ;
x( 2 ) = x( 2 ) + x( 3 ) ;        % advance phase value by angular frequency
x( 2 ) = mod( x( 2 ) , 2 * pi ) ; % limit phase values to range 0 --- 2 * pi
x( 1 ) = x( 4 ) * sin( x( 2 ) ) ; % sine value calculation

% update the 1st row of the jacobian matrix at state vector s values
A( 1 , : ) = [ 0 s( 4 ) * cos( s( 2 ) ) 0 sin( s( 2 ) ) ] ;

P = A * P * A' + Q ; % state transition model update of covariance matrix P

measurement_residual = [ my_sine_noisy( k ) - x( 1 ) ;              % sine residual
                         phase( k ) - x( 2 ) ;                      % phase residual
                         measured_angular_frequency( k ) - x( 3 ) ; % angular frequency residual
                         measured_amplitude( k ) - x( 4 ) ] ;       % amplitude residual
                         
innovation_residual_covariance = H * P * H' + R ;
kalman_gain = P * H' / innovation_residual_covariance ;

% update the state vector s with kalman_gain 
s = x + kalman_gain * measurement_residual ;

% some reality based post hoc adjustments
s( 2 ) = abs( s( 2 ) ) ; % no negative phase values
s( 3 ) = abs( s( 3 ) ) ; % no negative angular frequencies
s( 4 ) = abs( s( 4 ) ) ; % no negative amplitudes

% update the state covariance matrix P
% NOTE
% The Joseph formula is given by P+ = ( I − KH ) P− ( I − KH )' + KRK', where I is the identity matrix,
% K is the gain, H is the measurement mapping matrix, R is the measurement noise covariance matrix, 
% and P−, P+ are the pre and post measurement update estimation error covariance matrices, respectively.  
% The optimal linear unbiased estimator (equivalently the optimal linear minimum mean square error estimator)  
% or Kalman filter often utilizes simplified covariance update equations such as P+ = (I−KH)P− and P+ = P− −K(HP−H'+R)K'.  
% While these alternative formulations require fewer computations than the Joseph formula, they are only valid 
% when K is chosen as the optimal Kalman gain. In engineering applications, situations arise where the optimal 
% Kalman gain is not utilized and the Joseph formula must be employed to update the estimation error covariance.  
% Two examples of such a scenario are underweighting measurements and considering states. 
% Even when the optimal gain is used, the Joseph formulation is still preferable because it possesses 
% greater numerical accuracy than the simplified equations.
P = ( eye( n ) - kalman_gain * H ) * P * ( eye( n ) - kalman_gain * H )' + kalman_gain * R * kalman_gain' ;
 
xV( : , k ) = s ;                                                 % save estimated updated states
pV1( : , k ) = s ; pV1( 2 , k ) = pV1( 2 , k ) + pV1( 3 , k ) ;   % for plotting projections
pV2( : , k ) = s ; pV2( 2 , k ) = pV2( 2 , k ) + 2*pV2( 3 , k ) ; % for plotting projections
pV3( : , k ) = s ; pV3( 2 , k ) = pV3( 2 , k ) + 3*pV3( 3 , k ) ; % for plotting projections
 
endfor

% Plotting
figure(1) ; subplot(3,1,1) ; plot( real_phase , 'k', 'linewidth' , 2 , phase , 'b' , 'linewidth' , 2 , xV( 2 , : ) , 'r' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Phase State' ) ; legend( 'Actual Phase State' , 'Measured Phase State' , 'Estimated Phase Sate' ) ; ylabel( 'Radians' ) ;

figure(1) ; subplot(3,1,2) ; plot(angular_freq_rads, 'k', 'linewidth' , 2 , measured_angular_frequency , 'b' , 'linewidth' , 2 , xV( 3 , : ) , 'r' , 'linewidth' , 2 ) ; 
grid minor on ; 
title( 'Angular Frequency State in Radians' ) ; legend( 'Actual Angular Frequency' , 'Measured Angular Frequency' , 'Estimated Angular Frequency' ) ;

figure(1) ; subplot(3,1,3) ; plot( ones(1,N) , 'k', 'linewidth' , 1 , measured_amplitude , 'b' , 'linewidth' , 2 , xV( 4 , : ) , 'r' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Amplitude State' ) ; legend( 'Actual Amplitude' , 'Measured Amplitude' , 'Estimated Amplitude' ) ;

sine_est = h( xV ) ;
sine_1 = h( pV1 ) ; sine_1 = shift( sine_1 , 1 ) ;
sine_2 = h( pV2 ) ; sine_2 = shift( sine_2 , 2 ) ;
sine_3 = h( pV3 ) ; sine_3 = shift( sine_3 , 3 ) ;

figure(2) ; plot( gen_sine , 'k' , 'linewidth' , 2 , my_sine_noisy , '*b' , 'linewidth' , 2 , sine_est , 'r' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Underlying and its Estimate' ) ; legend( 'Real Underlying Sine' , 'Noisy Measured Sine' , 'EKF Estimate of Real Underlying Sine' ) ;

figure(3) ; plot( gen_sine , 'k' , 'linewidth' , 2 , my_sine_noisy , '*b' , 'linewidth' , 2 , sine_1 , 'r' , 'linewidth' , 2 , sine_2 , 'g' , 'linewidth' , 2 , ...
sine_3 , 'b' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Plot of the actual sine wave vs projected estimated sines' ) ; 
legend( 'Actual' , 'Noisy Measured Sine' , 'EKF Projected 1' , 'EKF Projected 2' , 'EKF Projected 3' ) ;
The measurement for the phase is an output of the sinewave indicator function, the period an output of the autocorrelation periodogram function and the amplitude the square root of 2 multiplied by the Root mean square of the sine wave over the autocorrelation periodogram measured period.

As usual the code is liberally commented. More soon. 

Monday, 27 May 2019

Extended Kalman Filter

In the code box below I provide code for an Extended Kalman filter to model a sine wave. This is a mashup of code from a couple of toolboxes I have found online, namely learning-the-extended-kalman-filter and EKF/UKF Tollbox for Matlab/Octave. The modelled states are the phase, angular frequency and amplitude of the sine wave and the measurement is the ( noisy ) sine wave value itself.
clear all ;
1;

function x_n = ekf_sine_f( x )
% Dynamical model function for the sine signal.
% Takes the input vector of phase, angular frequency in radians and 
% amplitude and calculates the next values of the states given the 
% current vector.

% transition matrix
A = [ 1 1 0 ;   % phase in Rads
      0 1 0 ;   % angular_freq_rads
      0 0 1 ] ; % amplitude
      
x_n = A * x ; 
x_n( 1 ) = mod( x_n( 1 ) , 2 * pi ) ;

endfunction

function Y = ekf_sine_h( x )
% Measurement model function for the sine signal.
% Takes the input vector of phase, angular frequency and amplitude and 
% calculates the current value of the sine given the input vector.

f = x( 1 , : ) ;    % phase in radians
a = x( 3 , : ) ;    % amplitude
Y = a .* sin( f ) ; % the sine

endfunction

function [ z , A ] = jacobian_transition( fun , x )
z = fun( x ) ;
A = [ 1 1 0 ;
      0 1 0 ;
      0 0 1 ] ;
endfunction

function [ z , A ] = jacobian_measurement( fun , x )
z = fun( x ) ;
A = [ x( 3 ) * cos( x( 1 ) ) 0 sin( x( 1 ) ) ] ;
endfunction

%%%%%%%%%%%%%%%%%%% Generate a sine wave %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
pkg load statistics ;
load all_snr ;
load all_hmm_periods_daily ;

N = 250 ; % total dynamic steps i.e.length of data to generate
[ gen_period , gen_states ] = hmmgenerate( N , transprobest , outprobest ) ;
gen_period = gen_period .+ hmm_min_period_add ;
angular_freq_rads = ( 2 * pi ) ./ gen_period ;
real_phase = cumsum( angular_freq_rads ) ; real_phase = mod( real_phase , 2 * pi ) ;
gen_sine = sin( real_phase ) ;
noise_val = mean( [ all_snr(:,1) ; all_snr(:,2) ] ) ;
my_sine_noisy = awgn( gen_sine , noise_val ) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
f = @ekf_sine_f ;
h = @ekf_sine_h ; 

n = 3 ;        % number of states - phase, angular frequency and amplitude
Q = eye( n ) ; % matrix for covariance of process

Q( 1 , 1 ) = var( real_phase ) ;
Q( 2 , 2 ) = var( angular_freq_rads ) ;
Q( 3 , 3 ) = var( sqrt(2) * sqrt( mean( my_sine_noisy.^2 ) ) ) ;
Q = 1 .* Q ; % scale the covariance matrix, a tunable factor ? 

R = var( gen_sine .- my_sine_noisy ) ; % measurement noise variance

% initial state
s = [ real_phase( 6 ) ;                                        % phase ( Rads )
      mean( angular_freq_rads( 1 : 20 ) ) ;                    % angular_freq_rads
      sqrt(2) * sqrt( mean( my_sine_noisy( 1 : 20 ).^2 ) ) ] ; % amplitude               
    
P = eye( n ) ;  % initial state covariance

% allocate memory
xV = zeros( n , N ) ;  % record estmates of states        
sV = ones( 1 , N ) ;   % real amplitudes 
pV1 = zeros( n , N ) ; % projection measurments
pV2 = zeros( n , N ) ; % projection measurments
pV3 = zeros( n , N ) ; % projection measurments

for k = 7 : N
 
z = my_sine_noisy( k ) ; % noisy measurement of sine

% do ekf
[ x1 , A ] = jacobian_transition( f , s ) ;   % nonlinear update and linearisation at current state
P = A * P * A' + Q ;                          % partial update
[ z1 , H ] = jacobian_measurement( h , x1 ) ; % nonlinear measurement and linearisation
P12 = P * H' ;                                % cross covariance
R = chol( H * P12 + R ) ;                     % Cholesky factorisation
U = P12 / R ;                                 % K = U / R' ; Faster because of back substitution
s = x1 + U * ( R' \ ( z - z1 ) ) ;            % Back substitution to get state update
P = P - U * U' ;                              % Covariance update, U * U' = P12 / R / R' * P12' = K * P12  
 
xV( : , k ) = s ;                             % save estimated updated states
pV1( : , k ) = s ; pV1( 1 , k ) = pV1( 1 , k ) + pV1( 2 , k ) ;   % for plotting
pV2( : , k ) = s ; pV2( 1 , k ) = pV2( 1 , k ) + 2*pV2( 2 , k ) ; % for plotting
pV3( : , k ) = s ; pV3( 1 , k ) = pV3( 1 , k ) + 3*pV3( 2 , k ) ; % for plotting
 
endfor

% Plotting
figure(1) ; subplot(3,1,1) ; plot( real_phase , 'k', 'linewidth' , 2 , xV( 1 , : ) , 'r' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Phase State' ) ; legend( 'Actual Phase State' , 'Estimated Phase Sate' ) ; ylabel( 'Radians' ) ;

figure(1) ; subplot(3,1,2) ; plot(angular_freq_rads, 'k', 'linewidth' , 2 , xV(2,:), 'r' , 'linewidth' , 2 ) ; grid minor on ; 
title( 'Angular Frequency State in Radians' ) ; legend( 'Actual Angular Frequency' , 'Estimated Angular Frequency' ) ;

figure(1) ; subplot(3,1,3) ; plot( sV , 'k', 'linewidth' , 1 , xV(3,:), 'r' , 'linewidth' , 2 ) ; grid minor on ;
title( 'Amplitude State' ) ; legend( 'Actual Amplitude' , 'Estimated Amplitude' ) ;

sine_est = h( xV ) ;
sine_1 = h( pV1 ) ; sine_1 = shift( sine_1 , 1 ) ;
sine_2 = h( pV2 ) ; sine_2 = shift( sine_2 , 2 ) ;
sine_3 = h( pV3 ) ; sine_3 = shift( sine_3 , 3 ) ;

figure(2) ; plot( gen_sine , 'k' , 'linewidth' , 2 , my_sine_noisy , '*b' , 'linewidth' , 2 , sine_est , 'r' , 'linewidth' , 2 ) ; 
title( 'Underlying and its Estimate' ) ; legend( 'Real Underlying Sine' , 'Noisy Measured Sine' , 'EKF Estimate of Real Underlying Sine' ) ;

figure(3) ; plot( gen_sine , 'k' , 'linewidth' , 2 , sine_1 , 'r' , 'linewidth' , 2 , sine_2 , 'g' , 'linewidth' , 2 , sine_3 , 'b' , 'linewidth' , 2 ) ;
title( 'Plot of the actual sine wave vs projected estimated sines' ) ; 
legend( 'Actual' , 'EKF Projected 1' , 'EKF Projected 2' , 'EKF Projected 3' ) ;
Typical output plots are the states
the filtered estimate
and shifted, future projections using the current estimated states
The code is well commented so I won't talk further about it in this post as this is part of ongoing work. A fuller discussion in due course.

Friday, 5 April 2019

Tests of Constant and Variable Acceleration Model Kalman Filters

In my last post I said that this next post would report the results of tests on a Constant Acceleration model Kalman filter, and the results are: fail, just like the Constant Velocity model, so I won't bore readers with reporting the details of the failed tests. However, tests of a Variable Acceleration model have been more successful, and so this post is about the results of tests on this model.

The Kalman filter states in this variable acceleration model are position, velocity, acceleration and three constants that are used to calculate the acceleration; b, c and d. The state transition matrix for this model is
F = [ 1 1 0 1/2 1/6 1/12 ; % position
      0 1 0 1 1/2 1/3 ;    % velocity
      0 0 0 1 1 1 ;        % acceleration
      0 0 0 1 0 0 ;        % b
      0 0 0 0 1 0 ;        % c
      0 0 0 0 0 1 ] ;      % d
with a measurement/observation matrix of
H = [ 1 0 0 0 0 0 ; 
      0 1 0 0 0 0 ; 
      0 0 1 0 0 0 ;
      0 0 0 1 0 0 ;
      0 0 0 0 1 0 ;
      0 0 0 0 0 1 ] ; % measurement matrix
Of course there are no "measurements" of b, c and d that can be directly taken from a price series, so my solution to this is to use Octave's fminunc function to minimise this
function [ J ] = kalman_bcd_constant_minimisation( bcd , position_measurements , velocity_measurements , accel_measurements )
pos_proj = position_measurements( 1 ) + velocity_measurements( 1 ) + [ 1/2 1/6 1/12 ] * bcd ;
vel_proj = velocity_measurements( 1 ) + [ 1 1/2 1/3 ] * bcd ;
accel_proj = sum( bcd ) ;  
pos_proj_cost = ( pos_proj - position_measurements( 2 ) )^2 ;
vel_proj_cost = ( vel_proj - velocity_measurements( 2 ) )^2 ;
accel_proj_cost = ( accel_proj - accel_measurements( 2 ) )^2 ;
J = pos_proj_cost + vel_proj_cost + accel_proj_cost ;  
endfunction
where position, velocity and acceleration measurements are calculated as described in my previous post. These cost minimised b, c and d constants are used as the noisy measurements for filter input purposes.

One implementation bug I experienced is that I was unable to use the ALS package to model the process noise as the dlqe function it calls kept crashing my session - so as a result I went with the crude, hand constructed covariance matrix as per the code in my previous post.

I wrote above that the tests on this model were the most successful to date. What I mean by this is that the Innovation Magnitude Bound test, the Innovation Zero-centred tests and Innovation auto correlation plot zero-centred tests on the position, velocity and acceleration states were passed 100% and about half of the Innovation auto correlation plot bound tests passed. The Innovation Whiteness (Auto correlation) tests varied from low single figure to 60+% percentage pass rates. However, I'm not 100% certain that I have correctly implemented this last set of tests so these results are suspect. Despite this reservation I think that I can say the Variable Acceleration Model expressed as a polynomial in time is the simplest kinematic model that is suitable for use on financial time series.

More in due course. 

Monday, 1 April 2019

Test of Constant Velocity Model Kalman Filter

Following on from my previous post, this post is a more detailed description of the testing methodology to test kinematic motion models on financial time series. The rationale behind the test(s) which are described below is different from the usual backtesting in that the test(s) are to determine whether the Kalman filter model is mismatched or not, i.e. whether the model innovations match the assumption that the residuals are Gaussian.

If readers have read the estimation lecture notes linked in my earlier post you'll know that one problem with estimating the above is that it is often difficult or impossible to know the covariance matrices for the process noise and the measurement noise, and getting these wrong effects the performance of the Kalman filter even if the underlying model is accurately specified. My "solution" to this is to use the Savitzky-Golay smoothing filter to smooth the time series and at the same time to get estimates for the velocity, acceleration etc. In real time of course this cannot be done because the Savitzky-Golay filter is a "centred" smoother and thus requires knowledge of data in the future. Using this information a crude estimate of the covariance matrices is thus derived, as shown in Octave code in the code box below:
% Estimate measurements for matrix R and covariance matrix Q by using the best
% fit quadratic using above sgolay filter coeffs
R = zeros( 2 , 2 ) ; Q = R ; % position and velocity model

% variance of position measurement
R( 1 , 1 ) = var( measured_pos( 1 , 7 : end - 6 ) .- real_pos( 1 , 7 : end - 6 ) ) ;

% variance of velocity measurement
R( 2 , 2 ) = var( measured_vel( 1 , 7 : end - 6 ) .- real_vel( 1 , 7 : end - 6 ) ) ;

% covariance of position process noise
Q( 1 , 1 ) = cov( ( real_pos( 1 , 7 : end - 7 ) .+ real_vel( 1 , 7 : end - 7 ) ) .- real_pos( 1 , 8 : end - 6 ) ) ;

% covariance of velocity process noise
Q( 2 , 2 ) = cov( diff( real_vel( 1 , 7 : end - 6 ) ) ) ;

% covariance of position process noise and velocity process noise
Q( 1 , 2 ) = cov( ( real_pos( 1 , 7 : end - 7 ) .+ real_vel( 1 , 7 : end - 7 ) ) .- real_pos( 1 , 8 : end - 6 ) , ...
diff( real_vel( 1 , 7 : end - 6 ) ) ) ;
Q( 2 , 1 ) = Q( 1 , 2 ) ;
where real_pos and real_vel are calculated using the Savitzky-Golay filter and represent the "true" position and velocity of a Constant Velocity model compared to the measured_pos and measured_vel measurements that are taken in real time. These crude estimates of the covariance matrices are used as inputs to the Autocovariance Least Squares Method implemented by means of the Octave ALS package to get a hopefully much more accurate estimate of the noise covariances. Armed with these estimates, the test code as outlined in the above mentioned estimation lecture notes paper and my previous post can then be run.

To avoid having to manually judge success or failure by looking at numerous plots such as this one
I run a few simple statistical tests
% Performance Tests for an Unmatched Kalman Filter

% Test 1 ---> Innovation Magnitude Bound Test
% check the innovation is bounded by +/- 2 * sqrt( S )
test_1 = sum( ( nu( 1 , 7 : end - 3 ) < 2 * s( 1 , 7 : end - 3 ) ) .* ...
( nu( 1 , 7 : end - 3 ) > -2 * s( 1 , 7 : end - 3 ) ) ) / length( s( 1 , 7 : end - 3 ) ) ;
% greater than 0.95 is a pass, i.e. less than 5 % of values out of bounds
% record result 
all_test_results( jj , 1 ) = test_1 ; 

%  Perform a T-test of the null hypothesis 'mean (X) == M' for a
%  sample X from a normal distribution with unknown mean and unknown
%  std deviation.  Under the null, the test statistic T has a
%  Student's t distribution.  The default value of M is 0.
%  If H is 0 the null hypothesis is accepted, if it is 1 the null
%  hypothesis is rejected.
zero_mean_innovation_pos = ttest( nu( 1 , : ) , 0 ) ; % innovations are centred around zero?
% H = 0 is a pass, i.e. centred around zero
% record result 
all_test_results( jj , 2 ) = zero_mean_innovation_pos ; 

% Test 2 ---> Normalised Innovation Squared Test
%  > qchisq( c( 0.05 , 0.95 ) , df = 1 , lower.tail = FALSE )
%  [1] 3.84145882 0.00393214
%  > qchisq( c( 0.05 , 0.95 ) , df = 2 , lower.tail = FALSE )
%  [1] 5.9914645 0.1025866

d_of_freedom = sum( sum( H ) ) ;
switch ( d_of_freedom )
  case 1
  lower_CI = 0.00393214 ;
  upper_CI = 3.84145882 ;
case 2
  lower_CI = 0.1025866 ;
  upper_CI = 5.9914645 ;  
endswitch

normalised_innovation_squared_chi2_test_pos = ( sum_Q_pos > lower_CI ) * ( sum_Q_pos < upper_CI ) ;
% value of 1 is a pass
% record result 
all_test_results( jj , 3 ) = normalised_innovation_squared_chi2_test_pos ;  

% Test 3 ---> Innovation Whiteness (Autocorrelation) Test
% check the innovation is bounded by +/- 2 * std
%level = 2 * std( r_pos( N : 2 * N - 1 ) / r_pos( N ) ) ;
level = 1.96 / sqrt( N - burn_in - 2 ) ;
##test_3 = sum( ( r_pos( N : 2 * N - 1 ) / r_pos( N ) < level ) .* ( r_pos( N : 2 * N - 1 ) / r_pos( N ) > -level ) ) / ...
##length( r_pos( N : 2 * N - 1 ) / r_pos( N ) ) ;

test_3 = sum( ( r_pos( no_lags + 1 : end ) < level ) .* ( r_pos( no_lags + 1 : end ) > -level ) ) ...
/ length( r_pos( no_lags + 1 : end) ) ;

% greater than 0.95 is a pass, i.e. less than 5 % of values out of bounds
% record result 
all_test_results( jj , 4 ) = test_3 ; 

zero_mean_autocorrelation_plot_pos = ttest( r_pos( no_lags + 1 : end ) , 0 ) ; % centred around zero
% H = 0 is a pass, i.e. centred around zero
% record result 
all_test_results( jj , 5 ) = zero_mean_autocorrelation_plot_pos ; 
across a range of forex crosses and my currency indices and see if, say 95%, of them pass these tests.

As a result of my first tests thus outlined on the Constant Velocity Kinematic Model I think can confidently say that this model is "mismatched," despite it sometimes being described as the simplest "useful" model, and therefore readers should not use this model in a Kalman filter on financial time series. It is worth noting that this model is named Model One in the Trend Without Hiccups paper, which in turn is very closely related to a local linear trend model, which the authors name Model Two. Given that the underlying model(s) for both are mismatched it is not surprising that the authors report disappointing results from these models.

In my next post I shall report results on the Constant Acceleration Model.

Wednesday, 20 March 2019

Revisiting the Kalman Filter

Some time ago ( here, here and here ) I posted about the Kalman filter and recently I have been looking at Kalman filters again because of this Trend Without Hiccups paper hosted at SSRN. I also came across this Estimation Lecture paper which provides MATLAB code for the testing of Kalman filters and my Octave suitable version of this code is shown in the code box below.
clear all ;
1 ; % some function declarations
function [ x_pred , p_pred ] = predict( x , P , F , Q )
x_pred = F * x ;
p_pred = F * P *F' + Q ;
endfunction

function [ nu , S ] = innovation( x_pred , p_pred , z , H , R )
nu = z - H * x_pred ;     % innovation
S = H * p_pred * H' + R ; % innovation covariance
endfunction

function [ x_new , p_new ] = innovation_update( x_pred , p_pred , nu , S , H )
K = p_pred * H' / S ;         % Kalman gain
x_new = x_pred + K * nu ;     % new state
p_new = p_pred - K * S * K' ; % new covariance
endfunction

pkg load signal ;     % for xcorr function

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Script to generate some "true" constant velocity data for later assessment
%%% Generates:
%%%  x: the state history which evolves according to 
%%%  x(k+1) = Fx(k) + w(k)
%%%  w: the process noise history (randomly generated)
%%%  z: a set of observations on the state corrupted by noise 
%%%  v: the noise on each observation (randomly generated)
 
N = 100 ;
delT = 1 ;
F = [ 1 delT ;
      0 1 ] ;
      
H = [ 1 0 ] ;

% process and measurement noise variances
sigma2Q = 0.01 ;
sigma2R = 0.1 ;

% process covariance matrix
Q = sigma2Q * [ delT/3 delT/2 ;
                delT/2 delT ] ;
                
P = Q ;
R = sigma2R * [ 1 ] ;

x = zeros( 2 , N ) ;
w = zeros( 2 , N ) ;
z = zeros( 1 , N ) ;
v = zeros( 1 , N ) ;

for ii = 2 : N
w( : , ii ) = randn( 2 , 1 ) .* sigma2Q ;         % generate process noise
x( : , ii ) = F * x( : , ii - 1 ) + w( : , ii ) ; % update state
v( : , ii ) = randn( 1 ) * R ;                    % generate measurement noise
z( : , ii ) = H * x( : , ii ) + v( : , ii ) ;     % get "true" measurement
endfor

% visualise data
%figure( 1 ) ; plot( x( 1 , : ) , 'k' , 'linewidth' , 2  ) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Script to generate some "true" constant acceleration data for later assessment
%%% Generates:
%%%  x: the state history which evolves according to 
%%%  x(k+1) = Fx(k) + w(k)
%%%  w: the process noise history (randomly generated)
%%%  z: a set of observations on the state corrupted by noise 
%%%  v: the noise on each observation (randomly generated)
 
N = 100 ;
delT = 1 ;
F = [ 1 delT delT/2 ;
      0 1 delT ;
      0 0 1 ] ;

H = [ 1 0 0 ] ;

% process and measurement noise variances
sigma2Q = 0.01 ;
sigma2R = 0.1 ;

% process covariance matrix
Q = sigma2Q * [ delT/20 delT/8 delT/6 ;
                delT/8 delT/3 delT/2 ;
                delT/6 delT/2 delT ] ;
                
P = Q ;
R = sigma2R * [ 1 ] ;

x = zeros( 3 , N ) ;
w = zeros( 3 , N ) ;
z = zeros( 1 , N ) ;
v = zeros( 1 , N ) ;

for ii = 2 : N
w( : , ii ) = randn( 3 , 1 ) .* sigma2Q ;         % generate process noise
x( : , ii ) = F * x( : , ii - 1 ) + w( : , ii ) ; % update state
v( : , ii ) = randn( 1 ) * R ;                    % generate measurement noise
z( : , ii ) = H * x( : , ii ) + v( : , ii ) ;     % get "true" measurement
endfor

% visualise data
%figure( 1 ) ; plot( x( 1 , : ) , 'k' , 'linewidth' , 2  ) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Octave script to assess Kalman filter performance
%%% The script assumes the existence of a vector z of
%%% noise corrupted observations
N = length( z ) ; % number of Klamn filter iterations
Qfactor = 1 ;     % process noise mult factor
Rfactor = 1 ;     % measurement noise mult factor
delT = 1 ;        % time step

F = [ 1 delT ;
      0 1 ] ; % update matrix
      
H = [ 1 0 ] ; % measurement matrix

sigmaQ = Qfactor * sqrt( 0.01 ) ;
sigmaR = Rfactor * sqrt( 0.1 ) ;

Q = sigmaQ^2 * [ 1/3 1/2 ; 
                 1/2 1 ] ; % process noise covariance matrix
                 
P = 10 * Q ;
R = sigmaR^2 * [ 1 ] ;     % measurement noise covariance

% allocate space prior to loop
xhat = zeros( 2 , N ) ; % state estimate
nu = zeros( 1 , N ) ;   % innovation
S = zeros( 1 , N ) ;    % innovation (co)variance
q = zeros( 1 , N ) ;    % normalised innovation squared

for ii = 2 : N
  
  % predict using update matrix F and updated values of P and Q from previous ii loop
  % x_pred is the prediction of state values
  % p_pred is the prediction of the covariance matrix P given previous P and Q
  [ x_pred , p_pred ] = predict( xhat( : , ii - 1 ) , P , F , Q ) ;

  % measurement
  % "nu" is difference between predicted values and measured values of the states,
  % given the measurement matrix H and measurement noise R
  % "S" is a measurement of how the covariance matrix P is predicted to have changed 
  % during the above prediction step, given that this cannot actually be known or
  % directly measured as not all the underlying state changes can be directly measured. 
  [ nu( : , ii ) , S( : , ii ) ] = innovation( x_pred , p_pred , z( ii ) , H , R ) ; % orig
  
  % update step updates the state estimates and covariance matrix P using the Kalman gain,
  % which is internally calculated in the "innovation_update" function
  [ xhat( : , ii ) , P ] = innovation_update( x_pred , p_pred , nu( : , ii ) , S( : , ii ) , H ) ;

  % q is just a record keeping vector for later analysis of normalised innovation squared
  q( : , ii ) = nu( : , ii )' * inv( S( : , ii ) ) * nu( : , ii ) ;

endfor

sumQ = sum( q ) ; % determine Sum q which is Chiˆ2 on N d.o.f.
r = xcorr( nu ) ; % get autocorrealtion of innovation

% plot state and state estimate 
subplot( 2 , 2 , 1 ) ; plot( x( 1 , : ) , 'k' , 'linewidth' , 2 , xhat( 1 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'State and State Esimate' ) ;legend( 'State' , 'State Estimate' ) ; 

% plot innovation and 2sigma confidence interval
subplot( 2 , 2 , 2 ) ; plot( nu , 'b' , 'linewidth' , 2 , 2 * sqrt( S ) , 'r' , -2 * sqrt( S ) , 'r' , ...
zeros(1,N) , 'r.' , 'linewidth' , 1 ) ;
title( 'Innovation and 2sigma confidence intervals' ) ; legend( 'Innovation' , '2Sigma Levels' ) ; 

% plot normalised innovation squared
subplot( 2 , 2 , 3 ) ; plot( q , 'k' , 'linewidth' , 2 , mean( q ) .* ones( 1 , N ) , 'r' , 'linewidth' , 2 ) ;
title( 'Normalised innovation squared' ) ; 

% plot autocorrelation of innovation (normalised)
subplot( 2 , 2 , 4 ) ; plot( r( N : 2 * N - 1 ) / r( N ) , 'k' , 'linewidth' , 2 , zeros(1,N) , 'k.' , 'linewidth' , 1 ) ; 
title( 'Autocorrelation of innovation (normalised)' ) ;
Soon I shall be using this code, with some additions perhaps, to test various Kinematic model implementations of Kalman filters on financial time series with a view to identifying which models are suitable or not.

More in the near future. 

Thursday, 12 July 2012

Brute Force Classifier in Action

As an update to my recent post, here is a short video of the brute force similarity search classifier in action.

Non-embedded view here.
The coloured coded candlestick bars are coloured thus: purple = a cyclic market classification; green = up with retracement; blue = up with no retracement; yellow = down with retracement; red = down with no retracement. The upper price series is the classification as per the brute force algorithm and the lower is the classification as per my Naive Bayesian classifier, shown for comparative purposes.  The cyan trend line is my implementation of a Kalman filter, and where this trend line extends out at the hard right hand edge of the chart it changes to become the prediction of the Kalman filter for the next 10 bars, this prediction based on extending the pattern that was selected during the run of the brute force algorithm.

I will leave it up to readers to judge for themselves the efficacy of this new indicator, but I think it shows some promise, and I have some ideas about how it can be improved. This, however, is work for the future. For now I intend to crack on with working on my neural net classification algorithm. 

Wednesday, 6 June 2012

Creation of a Simple Benchmark Suite

For some time now I have been toying with the idea of creating a simple benchmark suite to compare my own back test system performance with that of some public domain trading systems. I decided to select a few examples from the Trading Blox Users' Guide, specifically:
  • exponential moving average crossovers of periods 10-20 and 20-50
  • triple moving average crossover system with periods 10-20-50
  • Bollinger band breakouts of periods 20 and 50 with 1 & 2 standard deviations for exits and entries
  • donchian channel breakouts with periods 20-10 and 50-25
This is a total of 7 systems, and in the Rcpp code below these form a sort of "committee" to vote to be either long/short 1 contract, or neutral.

# This function takes as inputs vectors of opening and closing prices
# and creates a basic benchmark output suite of system equity curves 
# for the following basic trend following systems
# exponential moving average crossovers of 10-20 and 20-50
# triple moving average crossovers of 10-20-50
# bollinger band breakouts of 20 and 50 with 1 & 2 standard deviations
# donchian channel breakouts of 20-10 and 50-25
# The libraries required to compile this function are
# "Rcpp," "inline" and "compiler." The function is compiled by the command
# > source("basic_benchmark_equity.r") in the console/terminal.

library(Rcpp) # load the required library
library(inline) # load the required library
library(compiler) # load the required library

src <- '
#include 
#include 

Rcpp::NumericVector open(a) ;
Rcpp::NumericVector close(b) ;
Rcpp::NumericVector market_mode(c) ;
Rcpp::NumericVector kalman(d) ;
Rcpp::NumericVector tick_size(e) ;
Rcpp::NumericVector tick_value(f) ;
int n = open.size() ;
Rcpp::NumericVector sma_10(n) ;
Rcpp::NumericVector sma_20(n) ;
Rcpp::NumericVector sma_50(n) ;
Rcpp::NumericVector std_20(n) ;
Rcpp::NumericVector std_50(n) ;

// create equity output vectors
Rcpp::NumericVector market_mode_long_eq(n) ;
Rcpp::NumericVector market_mode_short_eq(n) ;
Rcpp::NumericVector market_mode_composite_eq(n) ;
Rcpp::NumericVector sma_10_20_eq(n) ; 
Rcpp::NumericVector sma_20_50_eq(n) ; 
Rcpp::NumericVector tma_eq(n) ; 
Rcpp::NumericVector bbo_20_eq(n) ;
Rcpp::NumericVector bbo_50_eq(n) ;
Rcpp::NumericVector donc_20_eq(n) ;
Rcpp::NumericVector donc_50_eq(n) ;
Rcpp::NumericVector composite_eq(n) ; 

// position vectors for benchmark systems
Rcpp::NumericVector sma_10_20_pv(1) ;
sma_10_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector sma_20_50_pv(1) ;
sma_20_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector tma_pv(1) ;
tma_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector bbo_20_pv(1) ;
bbo_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector bbo_50_pv(1) ;
bbo_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector donc_20_pv(1) ;
donc_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector donc_50_pv(1) ;
donc_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector comp_pv(1) ;
comp_pv[0] = 0.0 ; // initialise to zero, no position

// fill the equity curve vectors with zeros for "burn in" period
// and create the initial values for all indicators

for ( int ii = 0 ; ii < 50 ; ii++ ) {
    
    if ( ii >= 40 ) {
    sma_10[49] += close[ii] ; }

    if ( ii >= 30 ) {
    sma_20[49] += close[ii] ; }

    sma_50[49] += close[ii] ;

    std_20[ii] = 0.0 ;
    std_50[ii] = 0.0 ;

    market_mode_long_eq[ii] = 0.0 ;
    market_mode_short_eq[ii] = 0.0 ;
    market_mode_composite_eq = 0.0 ;
    sma_10_20_eq[ii] = 0.0 ;
    sma_20_50_eq[ii] = 0.0 ; 
    bbo_20_eq[ii] = 0.0 ;
    bbo_50_eq[ii] = 0.0 ;
    tma_eq[ii] = 0.0 ;
    donc_20_eq[ii] = 0.0 ;
    donc_50_eq[ii] = 0.0 ; 
    composite_eq[ii] = 0.0 ; } // end of initialising loop

    sma_10[49] = sma_10[49] / 10.0 ;
    sma_20[49] = sma_20[49] / 20.0 ;
    sma_50[49] = sma_50[49] / 50.0 ;

// the main calculation loop
for ( int ii = 50 ; ii < n-2 ; ii++ ) {

    // calculate the smas
    sma_10[ii] = ( sma_10[ii-1] - sma_10[ii-10] / 10.0 ) + ( close[ii] / 10.0 ) ;
    sma_20[ii] = ( sma_20[ii-1] - sma_20[ii-20] / 20.0 ) + ( close[ii] / 20.0 ) ;
    sma_50[ii] = ( sma_50[ii-1] - sma_50[ii-50] / 50.0 ) + ( close[ii] / 50.0 ) ;

      // calculate the standard deviations
      for ( int jj = 0 ; jj < 50 ; jj++ ) {
    
      if ( jj < 20 ) {
      std_20[ii] += ( close[ii-jj] - sma_20[ii] ) * ( close[ii-jj] - sma_20[ii] )  ; } // end of jj if

      std_50[ii] += ( close[ii-jj] - sma_50[ii] ) * ( close[ii-jj] - sma_50[ii] ) ; } // end of standard deviation loop

    std_20[ii] = sqrt( std_20[ii] / 20.0 ) ;
    std_50[ii] = sqrt( std_50[ii] / 50.0 ) ;

    //-------------------------------------------------------------------------------------------------------------------

    // calculate the equity values of the market modes
    // market_mode uwr and unr long signals
    if ( market_mode[ii] == 1 || market_mode[ii] == 2 ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; }

    // market_mode dwr and dnr short signals
    if ( market_mode[ii] == 3 || market_mode[ii] == 4 ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; }

    // calculate the equity values of the market modes
    // market_mode cyc long signals
    if ( market_mode[ii] == 0 && kalman[ii] > kalman[ii-1] ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; }

    // market_mode cyc short signals
    if ( market_mode[ii] == 0 && kalman[ii] < kalman[ii-1] ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; }

    //----------------------------------------------------------------------------------------------------------------------------

    // calculate the equity values and positions of each benchmark system
    // sma_10_20_eq
    if ( sma_10[ii] > sma_20[ii] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_10_20_pv[0] = 1.0 ; } // long

    // sma_10_20_eq
    if ( sma_10[ii] < sma_20[ii] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_10_20_pv[0] = -1.0 ; } // short

    // sma_10_20_eq
    if ( sma_10[ii] == sma_20[ii] && sma_10[ii-1] > sma_20[ii-1] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_10_20_pv[0] = 1.0 ; } // long

    // sma_10_20_eq
    if ( sma_10[ii] == sma_20[ii] && sma_10[ii-1] < sma_20[ii-1] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_10_20_pv[0] = -1.0 ; } // short

    //-----------------------------------------------------------------------------------------------------------

    // sma_20_50_eq
    if ( sma_20[ii] > sma_50[ii] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_20_50_pv[0] = 1.0 ; } // long

    // sma_20_50_eq
    if ( sma_20[ii] < sma_50[ii] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_20_50_pv[0] = -1.0 ; } // short

    // sma_20_50_eq
    if ( sma_20[ii] == sma_50[ii] && sma_20[ii-1] > sma_50[ii-1] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_20_50_pv[0] = 1.0 ; } // long

    // sma_20_50_eq
    if ( sma_20[ii] == sma_50[ii] && sma_20[ii-1] < sma_50[ii-1] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_20_50_pv[0] = -1.0 ; } // short

    //-----------------------------------------------------------------------------------------------------------

    // tma_eq
    if ( tma_pv[0] == 0.0 ) {

      // default position
      tma_eq[ii] = tma_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( sma_10[ii] > sma_20[ii] && sma_20[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      tma_pv[0] = 1.0 ; } // long

      if ( sma_10[ii] < sma_20[ii] && sma_20[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      tma_pv[0] = -1.0 ; } // short

    } // end of tma_pv == 0.0 loop

    if ( tma_pv[0] == 1.0 ) {

      // default long position
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( sma_10[ii] < sma_20[ii] && sma_10[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] ; 
      tma_pv[0] = 0.0 ; } // exit long, go neutral

      if ( sma_10[ii] < sma_20[ii] && sma_20[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      tma_pv[0] = -1.0 ; } // short
    
    } // end of tma_pv == 1.0 loop

    if ( tma_pv[0] == -1.0 ) {

      // default short position
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( sma_10[ii] > sma_20[ii] && sma_10[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] ; 
      tma_pv[0] = 0.0 ; } // exit short, go neutral

      if ( sma_10[ii] > sma_20[ii] && sma_20[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      tma_pv[0] = 1.0 ; } // long

    } // end of tma_pv == -1.0 loop

    //------------------------------------------------------------------------------------------------------------

    // bbo_20_eq
    if ( bbo_20_pv[0] == 0.0 ) {

      // default position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_20_pv[0] = 1.0 ; } // long

      if ( close[ii] < sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_20_pv[0] = -1.0 ; } // short

    } // end of bbo_20_pv == 0.0 loop

    if ( bbo_20_pv[0] == 1.0 ) {

      // default long position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < sma_20[ii] + std_20[ii] && close[ii] > sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ; 
      bbo_20_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_20_pv[0] = -1.0 ; } // short
    
    } // end of bbo_20_pv == 1.0 loop

    if ( bbo_20_pv[0] == -1.0 ) {

      // default short position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > sma_20[ii] - std_20[ii] && close[ii] < sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ; 
      bbo_20_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_20_pv[0] = 1.0 ; } // long

    } // end of bbo_20_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // bbo_50_eq
    if ( bbo_50_pv[0] == 0.0 ) {

      // default position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_50_pv[0] = 1.0 ; } // long

      if ( close[ii] < sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_50_pv[0] = -1.0 ; } // short

    } // end of bbo_50_pv == 0.0 loop

    if ( bbo_50_pv[0] == 1.0 ) {

      // default long position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < sma_50[ii] + std_50[ii] && close[ii] > sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ; 
      bbo_50_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_50_pv[0] = -1.0 ; } // short
    
    } // end of bbo_50_pv == 1.0 loop

    if ( bbo_50_pv[0] == -1.0 ) {

      // default short position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > sma_50[ii] - std_50[ii] && close[ii] < sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ; 
      bbo_50_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_50_pv[0] = 1.0 ; } // long

    } // end of bbo_50_pv == -1.0 loop

    //-----------------------------------------------------------------------------------------------------

    // donc_20_eq
    if ( donc_20_pv[0] == 0.0 ) {

      // default position
      donc_20_eq[ii] = donc_20_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_20_pv[0] = 1.0 ; } // long

      if ( close[ii] < *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_20_pv[0] = -1.0 ; } // short

    } // end of donc_20_pv == 0.0 loop

    if ( donc_20_pv[0] == 1.0 ) {

      // default long position
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < *std::min_element( &close[ii-10], &close[ii] ) && close[ii] > *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] ; 
      donc_20_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_20_pv[0] = -1.0 ; } // short
    
    } // end of donc_20_pv == 1.0 loop

    if ( donc_20_pv[0] == -1.0 ) {

      // default short position
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-10], &close[ii] ) && close[ii] < *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] ; 
      donc_20_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_20_pv[0] = 1.0 ; } // long

    } // end of donc_20_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // donc_50_eq
    if ( donc_50_pv[0] == 0.0 ) {

      // default position
      donc_50_eq[ii] = donc_50_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_50_pv[0] = 1.0 ; } // long

      if ( close[ii] < *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_50_pv[0] = -1.0 ; } // short

    } // end of donc_50_pv == 0.0 loop

    if ( donc_50_pv[0] == 1.0 ) {

      // default long position
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < *std::min_element( &close[ii-25], &close[ii] ) && close[ii] > *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] ; 
      donc_50_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_50_pv[0] = -1.0 ; } // short
    
    } // end of donc_50_pv == 1.0 loop

    if ( donc_50_pv[0] == -1.0 ) {

      // default short position
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-25], &close[ii] ) && close[ii] < *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] ; 
      donc_50_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_50_pv[0] = 1.0 ; } // long

    } // end of donc_50_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // composite_eq
    comp_pv[0] = sma_10_20_pv[0] + sma_20_50_pv[0] + tma_pv[0] + bbo_20_pv[0] + bbo_50_pv[0] + donc_20_pv[0] + donc_50_pv[0] ;
    
    if ( comp_pv[0] > 0 ) {
    composite_eq[ii] = composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; } // long

    if ( comp_pv[0] < 0 ) {
    composite_eq[ii] = composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; } // short 

    if ( comp_pv[0] == 0 ) {
    composite_eq[ii] = composite_eq[ii-1] ; } // neutral 

} // end of main for loop

// Now fill in the last two spaces in the equity vectors
market_mode_long_eq[n-1] = market_mode_long_eq[n-3] ;
market_mode_long_eq[n-2] = market_mode_long_eq[n-3] ;

market_mode_short_eq[n-1] = market_mode_short_eq[n-3] ;
market_mode_short_eq[n-2] = market_mode_short_eq[n-3] ;

market_mode_composite_eq[n-1] = market_mode_composite_eq[n-3] ;
market_mode_composite_eq[n-2] = market_mode_composite_eq[n-3] ;

sma_10_20_eq[n-1] = sma_10_20_eq[n-3] ;
sma_10_20_eq[n-2] = sma_10_20_eq[n-3] ;

sma_20_50_eq[n-1] = sma_20_50_eq[n-3] ;
sma_20_50_eq[n-2] = sma_20_50_eq[n-3] ;

tma_eq[n-1] = tma_eq[n-3] ;
tma_eq[n-2] = tma_eq[n-3] ;

bbo_20_eq[n-1] = bbo_20_eq[n-3] ;
bbo_20_eq[n-2] = bbo_20_eq[n-3] ;

bbo_50_eq[n-1] = bbo_50_eq[n-3] ;
bbo_50_eq[n-2] = bbo_50_eq[n-3] ;

donc_20_eq[n-1] = donc_20_eq[n-3] ;
donc_20_eq[n-2] = donc_20_eq[n-3] ;

donc_50_eq[n-1] = donc_50_eq[n-3] ;
donc_50_eq[n-2] = donc_50_eq[n-3] ;

composite_eq[n-1] = composite_eq[n-3] ;
composite_eq[n-2] = composite_eq[n-3] ;

return List::create(
  _["market_mode_long_eq"] = market_mode_long_eq ,
  _["market_mode_short_eq"] = market_mode_short_eq ,
  _["market_mode_composite_eq"] = market_mode_composite_eq ,
  _["sma_10_20_eq"] = sma_10_20_eq ,
  _["sma_20_50_eq"] = sma_20_50_eq , 
  _["tma_eq"] = tma_eq ,
  _["bbo_20_eq"] = bbo_20_eq ,
  _["bbo_50_eq"] = bbo_50_eq ,
  _["donc_20_eq"] = donc_20_eq ,
  _["donc_50_eq"] = donc_50_eq ,
  _["composite_eq"] = composite_eq ) ; '

basic_benchmark_equity <- cxxfunction(signature(a = "numeric", b = "numeric", c = "numeric",
                                d = "numeric", e = "numeric", f = "numeric"), body=src, 
                                plugin = "Rcpp")

This Rcpp function is then called thus, using Rstudio

library(xts)  # load the required library

# load the "indicator" file 
data <- read.csv(file="usdyenind",head=FALSE,sep=,)
tick_size <- 0.01
tick_value <- 12.50

# extract other vectors of interest
open <- data[,2]
close <- data[,5]
market_mode <- data[,228]
kalman <- data[,283]

results <- basic_benchmark_equity(open,close,market_mode,kalman,tick_size,tick_value)

# coerce the above results list object to a data frame object
results_df <- data.frame( results )
df_max <- max(results_df) # for scaling of results plot
df_min <- min(results_df) # for scaling of results plot

# and now create an xts object for plotting
results_xts <- xts(results_df,as.Date(data[,'V1']))

# a nice plot of the results_xts object
par(col="#0000FF")
plot(results_xts[,'market_mode_long_eq'],main="USDYEN Pair",ylab="$ Equity Value",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#B0171F")
plot(results_xts[,'market_mode_short_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#00FF00")
plot(results_xts[,'market_mode_composite_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#808080")
plot(results_xts[,'sma_10_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'sma_20_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'tma_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'bbo_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'bbo_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'donc_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'donc_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="black")
plot(results_xts[,'composite_eq'],main="",ylim=c(df_min,df_max),type="l")


to output .png files which I have strung together in this video
Non-embedded view here.
The light grey equity curves are the individual curves for the benchmark systems and the black is the "committee" 1 contract equity curve. Also shown are the long and short 1 contract equity curves ( blue and red respectively ), along with a green combined equity curve for these, for my Naive Bayesian Classifier following the simple rules
  • be long 1 contract if the market type is uwr, unr or cyclic with my Kalman filter pointing upwards
  • be short 1 contract if the market type is dwr, dnr or cyclic with my Kalman filter pointing downwards
Again this is just a toy example of the use of my Bayesian Classifier.  After I have completed Andrew Ng's machine learning course, as mentioned in my previous post, I will have a go at coding a neural net which may end up replacing the Bayesian Classifier.