Monday, 11 December 2017

Time Warp Edit Distance

Part of my normal routine is to indulge in online research for use useful ideas, and I recently came across An Empirical Evaluation of Similarity Measures for Time Series Classification, and one standout from this paper is the Time Warp Edit Distance where, from the conclusion, "...the TWED measure originally proposed by Marteau (2009) seems to consistently outperform all the considered distances..."

Below is my Octave .oct function version of the above linked MATLAB code.
#include octave oct.h
#include octave dmatrix.h
#include limits> // for infinity
#include math.h  // for sqrt

DEFUN_DLD ( twed, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} twed (@var{A , timeSA , B , timeSB , lambda, nu})\n\
Calculates the Time Warp Edit Distance between two univariate time series, A and B.\n\
timeSA and timeSB are the time stamps of the respective series, lambda is a penalty\n\
for a deletion operation and nu is an Elasticity parameter - nu >=0 needed for distance measure.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin != 6 )
   {
   error ("Invalid number of arguments. See help twed.") ;
   return retval_list ;
   }

if ( args(0).length () < 2 )
   {
   error ("Invalid 1st argument length. Must be >= 2.") ;
   return retval_list ;
   }
   
if ( args(1).length () != args(0).length () )
   {
   error ("Arguments 1 and 2 must be vectors of the same length.") ;
   return retval_list ;
   }
   
if ( args(2).length () < 2 )
   {
   error ("Invalid 3rd argument length. Must be >= 2.") ;
   return retval_list ;
   }
   
if ( args(3).length () != args(2).length () )
   {
   error ("Arguments 3 and 4 must be vectors of the same length.") ;
   return retval_list ;
   }   
   
if ( args(4).length () > 1 )
   {
   error ("Argument 5 must a single value for lambda.") ;
   return retval_list ;
   }  
  
if ( args(5).length () > 1 )
   {
   error ("Argument 6 must a single value for nu >= 0.") ;
   return retval_list ;
   }   

if ( error_state )
   {
   error ("Invalid arguments. See help twed.") ;
   return retval_list ;
   }
// end of input checking  
  
Matrix A_input = args(0).matrix_value () ;
   if( A_input.rows() == 1 && A_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    A_input = A_input.transpose () ; 
   }
   
Matrix timeSA_input = args(1).matrix_value () ;
   if( timeSA_input.rows() == 1 && timeSA_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    timeSA_input = timeSA_input.transpose () ; 
   }
   
Matrix B_input = args(2).matrix_value () ;
   if( B_input.rows() == 1 && B_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    B_input = B_input.transpose () ; 
   }
   
Matrix timeSB_input = args(3).matrix_value () ;
   if( timeSB_input.rows() == 1 && timeSB_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    timeSB_input = timeSB_input.transpose () ; 
   }
   
double lambda = args(4).double_value () ;
double nu = args(5).double_value () ;
double inf = std::numeric_limits::infinity() ;
Matrix distance ( 1 , 1 ) ; distance.fill ( 0.0 ) ;
double cost ;
  
// Add padding of zero by using zero-filled distance matrix
Matrix A = distance.stack( A_input ) ;
Matrix timeSA = distance.stack( timeSA_input ) ;
Matrix B = distance.stack( B_input ) ;
Matrix timeSB = distance.stack( timeSB_input ) ;

Matrix DP ( A.rows() , B.rows() ) ; DP.fill ( inf ) ; DP( 0 , 0 ) = 0.0 ; 
int n = timeSA.rows () ;
int m = timeSB.rows () ;

    // Compute minimal cost
    for ( octave_idx_type ii (1) ; ii < n ; ii++ )
    {
      
        for ( octave_idx_type jj (1) ; jj < m ; jj++ )
        {
          
        // Deletion in A
        DP( ii , jj ) = DP(ii-1,jj) +  sqrt( ( A(ii-1,0) - A(ii,0) ) * ( A(ii-1,0) - A(ii,0) ) ) + nu * ( timeSA(ii,0) - timeSA(ii-1,0) ) + lambda ;
    
        // Deletion in B
        cost = DP(ii,jj-1) + sqrt( ( B(jj-1,0) - B(jj,0) ) * ( B(jj-1,0) - B(jj,0) ) ) + nu * ( timeSB(jj,0) - timeSB(jj-1,0) ) + lambda ;
        DP( ii , jj ) = cost < DP( ii , jj ) ? cost : DP( ii , jj ) ;
    
        // Keep data points in both time series
        cost = DP(ii-1,jj-1) + sqrt( ( A(ii,0) - B(jj,0) ) * ( A(ii,0) - B(jj,0) ) ) + sqrt( ( A(ii-1,0) - B(jj-1,0) ) * ( A(ii-1,0) - B(jj-1,0) ) ) + nu * ( abs( timeSA(ii,0) - timeSB(jj,0) ) + abs( timeSA(ii-1,0) - timeSB(jj-1,0) ) ) ;
        DP( ii , jj ) = cost < DP( ii , jj ) ? cost : DP( ii , jj ) ;

        } // end of jj loop
        
      } // end of ii loop

distance( 0 , 0 ) = DP( n - 1 , m - 1 ) ;
      
retval_list(1) = DP ; 
retval_list(0) = distance ;

return retval_list ;

} // end of function
As a quick test I took the example problem from this Cross Validated thread, the applicability I hope being quite obvious to readers:

A = [1, 2, 3, 4, 5, 6, 7, 8, 9] ;
B1 = [1, 2, 3, 4, 5, 6, 7, 8, 12] ;
distance1 = twed( A , 1:9 , B1 , 1:9 , 1 , 0.001 )
distance1 =  3
B2 = [0, 3, 2, 5, 4, 7, 6, 9, 8] ;
distance2 = twed( A , 1:9 , B2 , 1:9 , 1 , 0.001 )
distance2 =  17
graphics_toolkit('fltk') ; plot(A,'k','linewidth',2,B1,'b','linewidth',2,B2,'r','linewidth',2);
legend( "A" , "B1" , "B2" ) ;
It can be seen that the twed algorithm correctly picks out B1 as being more like A than B2 (a lower twed distance, with default values for lambda and nu of 1 and 0.001 respectively, taken from the above survey paper) when compared with the simple squared error metric, which gives identical results for both B1 and B2.

More on this in due course.

Friday, 8 December 2017

candle.m Function Accepted

I have received an e-mail from the maintainer of the Octave Financial Package that my candlestick function will be rolled out with the next release of the financial package. For those readers who can't wait the final code, with revisions, is now available at the Octave-Forge here.

This represents a new milestone for me as this is my first, officially accepted contribution to any free and open source software (FOSS) project.

Tuesday, 5 December 2017

Candlestick Plotting Function Submitted for Inclusion in Octave Financial Package

I have today submitted an improved version of my basic candlestick plotting function, candle.m ( see previous post ) for inclusion in the Octave Financial package. As I am not sure when, or even if, it will be accepted, I provide a copy of it below.
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*- 
## @deftypefn {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color}, @var{dates})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color}, @var{dates}, @var{dateform})
##
## Plot the @var{highprices}, @var{lowprices}, @var{closeprices} and @var{openprices} of a security as a candlestick chart.
##
## HighPrices - High prices for a security. A column vector.
##
## LowPrices - Low prices for a security. A column vector.
##
## ClosePrices - Close prices for a security. A column vector.
##
## OpenPrices - Open prices for a security. A column vector.
##
## Color - (optional) Candlestick color is specified as a case insensitive four 
## character row vector, e.g. "brwk". The characters that are accepted are 
## k, b, c, r, m, w, g and y for black, blue, cyan, red, magenta, white, green 
## and yellow respectively. Default colors are "brwk" applied in order to bars 
## where the closing price is greater than the opening price, bars where the 
## closing price is less than the opening price, the chart background color and
## the candlestick wicks. If fewer than four colors are specified, they are
## applied in turn in the above order with default colors for unspecified colors. 
## For example, user supplied colors "gm" will plot green upbars and magenta
## downbars with a default white background and black wicks. If the user 
## specified color for background is black, without specifying the wick color, 
## e.g. "gmk", the default wick color is white. All other choices for background 
## color will default to black for wicks. If all four colors are user specified,
## those colors will be used. Doji bars and single price bars, e.g. open = high 
## = low = close, are plotted with the color for wicks, with single price bars 
## being plotted as points/dots.
##
## Dates - (Optional) Dates for user specified x-axis tick labels. Dates can be 
## a serial date number column (see datenum), a datevec matrix (See datevec)
## or a character vector of dates. If specified as either a datenum or a datevec,
## the Dateform argument is required. If the Dates argument is supplied, the 
## Color argument must also be explicitly specified.
##
## Dateform - (Optional) Either a Date character string or a single integer code
## number used to format the x-axis tick labels (See datestr). Only required if 
## Dates is specified as a serial date number column (See datenum) or a datevec 
## matrix (See datevec). 
##
## @seealso{datenum, datestr, datevec, highlow, bolling, dateaxis, movavg, 
## pointfig}
## @end deftypefn

## Author: dekalog 
## Created: 2017-12-05

function candle ( varargin )  
  
if ( nargin < 4 || nargin > 7 )
  print_usage ();
elseif ( nargin == 4 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = "brwk";
elseif ( nargin == 5 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5};
elseif ( nargin == 6 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5}; dates = varargin{6};
elseif ( nargin == 7 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5}; dates = varargin{6}; 
  dateform = varargin{7};
endif
   
if ( ! isnumeric ( HighPrices ) || ! isnumeric ( LowPrices ) || ...
  ! isnumeric ( ClosePrices ) || ! isnumeric ( OpenPrices ) ) # one of OHLC is not numeric
  error ("candle: The inputs for HighPrices, LowPrices, ClosePrices and OpenPrices must be numeric vectors.");  
endif

if ( size ( OpenPrices ) != size ( HighPrices ) )
  error ("candle: OpenPrices and HighPrices vectors are different lengths.");
endif

if ( size ( HighPrices ) != size ( LowPrices ) )
  error ("candle: HighPrices and LowPrices vectors are different lengths.");
endif 

if ( size ( LowPrices ) != size ( ClosePrices ) )
  error ("candle: LowPrices and ClosePrices vectors are different lengths.");
endif

if ( size ( ClosePrices, 1 ) == 1 && size ( ClosePrices, 2 ) > 1 ) # ohlc inputs are row vectors, so transpose them
  OpenPrices = OpenPrices'; HighPrices = HighPrices'; LowPrices = LowPrices'; 
  ClosePrices = ClosePrices';
  warning ("candle: The HighPrices, LowPrices, ClosePrices and OpenPrices should be column vectors. They have been transposed.");
endif
 
## check the user input Color argument, if it's character row vector
if ( ( nargin >= 5 && ischar ( color ) ) && size ( color, 1 ) == 1 )
  
  if ( size ( color, 2 ) == 1 )                      # only one color has been user specified
    color = [ tolower( color ) "rwk" ];              # so add default colors for down bars, background and wicks
  
  elseif ( size ( color, 2 ) == 2 )                  # two colors have been user specified
    color = [ tolower( color ) "wk" ];               # so add default colors for background and wicks
  
  elseif ( size ( color, 2 ) == 3 )                  # three colors have been user specified
  
    if ( color ( 3 ) == "k" || color ( 3 ) == "K" )  # if user selected background is black
     color = [ tolower( color ) "w" ];               # set wicks to default white
 
    else
     color = [ tolower( color ) "k" ];               # else default black wicks
 
    endif
  
  elseif ( size ( color, 2 ) >= 4 )                  # all four colors have been user specified, extra character inputs ignored  
  color = tolower( color );                          # correct in case user input contains upper case e.g. "BRWK" 
  
  endif 

elseif ( nargin >= 5 && ! ischar ( color ) )          # the user input for color is not a charcter vector

warning ("candle: The fifth input argument, Color, should be a character row vector for Color.\nThe chart has been plotted with default colors.");
color = "brwk"; 

elseif ( ischar ( color ) && size ( color, 1 ) != 1 ) # user input is more than one row of characters - a date character array by mistake?

warning ("candle: Color is not a single row character vector. Possibly a column Dates character vector?\nThe chart has been plotted with default colors.");
color = "brwk";

endif                                                 # end of nargin >= 5 && ischar ( color ) ) && size ( color, 1 ) == 1 if statement
  
wicks = HighPrices .- LowPrices;
body = ClosePrices .- OpenPrices;
up_down = sign ( body );
body_width = 20;
wick_width = 1;
doji_size = 10;
one_price_size = 15;

hold on;

## first, plot the chart background color
plot ( HighPrices, color( 3 ), LowPrices, color( 3 ) );
fill ( [ min( xlim ) max( xlim ) max( xlim ) min( xlim ) ], ...
       [ min( ylim ) min( ylim ) max( ylim ) max( ylim ) ], color( 3 ) );

## plot the wicks
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = x;
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = HighPrices;         # highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = LowPrices;             # lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1 , length ( HighPrices ) ) ], ...
              [] , 1 );
plot ( x, y, color( 4 ), "linewidth", wick_width );                           # plot wicks

## plot the up bar bodies 
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = ( up_down == 1 ); idx = find ( idx );                                   # index by condition close > open
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = ClosePrices( idx ); # body highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = OpenPrices( idx );     # body lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1, length ( HighPrices ) ) ], ...
              [], 1 );
plot ( x, y, color( 1 ), "linewidth", body_width );                           # plot bodies for up bars
  
## plot the down bar bodies
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = ( up_down == -1 ); idx = find ( idx );                                  # index by condition close < open
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = OpenPrices( idx );  # body highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = ClosePrices( idx );    # body lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1, length ( HighPrices ) ) ], ...
              [], 1 );
plot ( x, y, color( 2 ), "linewidth", body_width );                           # plot bodies for down bars

## plot special cases
## doji bars
doji_bar = ( HighPrices > LowPrices ) .* ( ClosePrices == OpenPrices ); ...
             doji_ix = find ( doji_bar ); 
             
if ( length ( doji_ix ) >= 1 )
  x = ( 1 : length ( ClosePrices ) );                                         # the x-axis  
  plot ( x( doji_ix ), ClosePrices( doji_ix ), [ "+" char ( color( 4 ) ) ], ...
        "markersize", doji_size );                                            # plot the open/close as horizontal dash  
endif

## OpenPrices == HighPrices == LowPrices == ClosePrices
one_price = ( HighPrices == LowPrices ) .* ( ClosePrices == OpenPrices ) .* ...
            ( OpenPrices == HighPrices ); one_price_ix = find ( one_price ); 
            
if ( length ( one_price_ix ) >= 1 )
  x = ( 1 : length ( ClosePrices ) );                                         # the x-axis  
  plot ( x( one_price_ix ), ClosePrices( one_price_ix ), ...
          [ "." char ( color( 4 ) ) ], "markersize", one_price_size );        # plot as a point/dot  
endif  
  
hold off;

## now add the x-axis tick labels, if the user has supplied the correct arguments

if ( nargin == 6 && isnumeric ( dates ) ) 
  error ("candle: If the sixth input argument, Dates, is a serial date number column (See datenum) or a datevec matrix (See datevec), Dateform input is required.\nThe chart has been plotted without x-axis dates.");
endif

if ( nargin == 6 && ischar ( dates ) )                                        # user has given a character vector of dates for dates
  
if ( size ( dates, 1 ) != size ( ClosePrices, 1 ) )
  error ("candle: The sixth input argument, Dates, and the OHLC prices vectors are different lengths.\nThe chart has been plotted without x-axis dates.");
else

ticks = cellstr ( dates ) ;
ax = "x" ;
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

endif
  
endif  
  
if ( nargin == 7 && isnumeric ( dates ) )                              # input arguments 6 and 7 assumed to be a dates vector and dateform respectively
  
if ( size ( dates, 1 ) != size ( ClosePrices, 1 ) )
  error ("candle: The sixth input argument, Dates, and the OHLC prices vectors are different lengths.\nThe chart has been plotted without x-axis dates.");
endif

if ( size ( dates, 2 ) == 1 )                                          # user has given a possible serial date number column for dates
 
is_monotonically_increasing = sum ( dates == cummax ( dates ) ) / size ( dates, 1 );

if ( is_monotonically_increasing != 1 )  
  error ("candle: Dates does not appear to be a serial date number column as it is not monotonically increasing.\nThe chart has been plotted without x-axis dates.");  
endif

if ( isnumeric ( dateform ) && ( dateform < 0 || dateform > 31 ) )     
  error ("candle: Dateform integer code number is out of bounds (See datestr).\nThe chart has been plotted without x-axis dates."); 
endif

if ( isnumeric ( dateform ) && rem ( dateform, 1 ) > 0 )
  error ("candle: Dateform code number should be an integer 0 - 31 (See datestr).\nThe chart has been plotted without x-axis dates.");
endif      

ticks = datestr ( dates, dateform );
ticks = mat2cell ( ticks, ones ( size ( ticks, 1 ), 1 ), size ( ticks, 2 ) );  
ax = "x";
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

elseif ( size ( dates, 2 ) == 6 )                                      # user has given a possible datevec matrix for dates

if ( isnumeric ( dateform ) && ( dateform < 0 || dateform > 31 ) )     
  error ("candle: Dateform integer code number is out of bounds (See datestr).\nThe chart has been plotted without x-axis dates."); 
endif

if ( isnumeric ( dateform ) && rem ( dateform, 1 ) > 0 )
  error ("candle: Dateform code number should be an integer 0 - 31 (See datestr).\nThe chart has been plotted without x-axis dates.");
endif 

ticks = datestr ( dates, dateform );
ticks = mat2cell ( ticks, ones ( size ( ticks, 1 ), 1 ), size ( ticks, 2 ) );
ax = "x";
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

else 
  error ("candle: The numerical Dates input is neither a single column serial date number nor a six column datevec format.\nThe chart has been plotted without x-axis dates."); 
  
endif

endif # end of ( nargin == 7 && isnumeric ( dates ) ) if statement 

endfunction

%!demo 1
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open );
%! title("default plot.");

%!demo 2
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk' );
%! title("default plot with user selected black background");

%!demo 3
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brkg' );
%! title("default color candlestick bodies and user selected background and wick colors");

%!demo 4
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'gmby' );
%! title("all four colors being user selected");

%!demo 5
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brwk', datenum_vec, "yyyy-mm-dd HH:MM" );
%! title("default plot with datenum dates and character dateform arguments");

%!demo 6
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk', datenum_vec, 31 );
%! title("default plot with user selected black background with datenum dates and integer dateform arguments");

%!demo 7
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! datevec_vec = datevec( datenum_vec );
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brwk', datevec_vec, 21 );
%! title("default plot with datevec dates and integer dateform arguments");

%!demo 8
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! character_dates = char ( [] );
%! for i = 1 : 11
%! character_dates = [ character_dates ; "a date" ] ;
%! endfor
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk', character_dates );
%! title("default plot with user selected black background with character dates argument");
I hope readers who are Octave users find this useful.

Monday, 20 November 2017

Candlestick Plotting Function for Octave.

I have long been frustrated by the lack of an "out of the box" solution for plotting OHLC candlestick charts natively in Octave, the closest solution I know being the highlow plot function from the financial package ( which does not yet implement a candle function ) over at Octave Sourceforge. This being the case, I decided to write my own candlestick plotting functions, the codes for which are shown below.

This first, basic version just plots a candlestick chart with blue for up bars ( close higher than open ) or red for down bars ( close less than open ).
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C})
## 
## Takes O, H, L and C inputs and plots a candlestick chart, with blue bars for
## close up days and red bars for close down days.
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-11-16

function [retval] = candles ( open , high , low , close )
  
  if ( nargin != 4 )
      error ( "Not enough input arguments. Should be OHLC vectors." ) ;
  endif
  
  wicks = high .- low ;
  body = close .- open ;
  up_down = sign( body ) ;
  
  hold on ;
  % plot the wicks
  x = ( 1 : length( close ) ) ; % the x-axis
  idx = x ;
  high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
  low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks
  
  % plot the up bars
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == 1 ) ; idx = find( idx ) ;                      % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
  low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "b" , 'linewidth' , 20 ) ;                           % plot blue up bars
  
  % plot the down bars
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == -1 ) ; idx = find( idx ) ;                     % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
  low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "r" , 'linewidth' , 20 ) ;                          % plot red down bars
  
  hold off ; 

endfunction
The second version is a conditional plotting version which takes a condition vector as input along with the OHLC vectors and plots candlesticks with different colours according to the condition in the condition vector ( conditions are integers 1 to 3 inclusive ).
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C}, @var{TC})
## 
## Takes OHLC inputs and a TC vector and plots a candlestick chart, with up bars coloured
## cyan or blue and down bars magenta or red, dependent on value contained in the 
## condition vector TC ( values == 1 or == 2 ).
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-11-16

function [retval] = candles_tc ( open , high , low , close , tc )
  
  if ( nargin != 5 )
     error ( "Not enough input arguments. Should be OHLC and a condition vector" ) ;
  endif
  
  if ( length( unique( tc ) ) > 3 )
     error ( "Too many conditions in condition vector. Maximum of 3 conditions allowed." ) ;
  endif
  
  wicks = high .- low ;
  body = close .- open ;
  up_down = sign( body ) ;
  up_colours = "cbg" ;
  down_colours = "mrk" ;
  candle_body_width = 20 ;
  
  hold on ;
  
  % plot the wicks
  x = ( 1 : length( close ) ) ; % the x-axis
  idx = x ;
  high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
  low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks

  for ii = 1 : length( unique( tc ) )
  
  % plot the up bars for ii condition
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == 1 ) .* ( tc == ii ) ; idx = find( idx ) ;      % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
  low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , up_colours( ii ) , 'linewidth' , candle_body_width ) ;
  
  % plot the down bars for ii condition
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == -1 ) .* ( tc == ii ) ; idx = find( idx ) ;     % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
  low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , down_colours( ii ) , 'linewidth' , candle_body_width ) ;

  endfor
  
  hold off ; 

endfunction
An example of a plot by this second version is
There are two conditions being plotted on this 1 hour chart: cyan up bars and magenta down bars are bars that occur in the "Asian session," i.e. after 17:00 New York Time local time and before 07:00 London local time; and blue up bars and red down bars are bars that occur in the overlapping London - New York session, i.e. between 07:00 London local time and 17:00 New York local time.

The horizontal black lines are not part of the basic plot function but are added later by use of the "hold" function. These lines are the "Tokyo Channel," i.e. the high and low of the Asian session extended into the immediately following London - New York session.

I hope readers who use Octave will find these plotting functions useful.

Tuesday, 31 October 2017

Prepending Historical Data with Oanda's R API

As a follow on to my previous post, which was about appending data, the script below prepends historical data to an assumed existing data record.
% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data/hourly")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {  

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   current_ohlc_record = read.table( file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , header = FALSE , na = "" , sep = "," ,
                                      stringsAsFactors = FALSE )
   
   current_ohlc_record_begin_date_time = as.character( current_ohlc_record[ 1 , 1 ] ) % get the date/time value to be matched
   last_date_ix = as.Date( current_ohlc_record[ 1 , 1 ] )                             % the end date for new data to be downloaded             
   
   % last 40 weeks of hourly data approx = 5000 hourly bars            
   begin_date_ix = as.Date( last_date_ix - 280 )                      % the begin date for new data to be downloaded

   % download the missing historical data from begin_date_ix to last_date_x.
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          begin_date_ix , last_date_ix + 2 ) % +2 to ensure that the end of the new downloaded data will
                                                             % overlap with the beginning of current_ohlc_record
  
   % having ensured no data is missed by overlaping with the current_ohlc_record, delete duplicated OHLC information
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value
   
   ix = charmatch( current_ohlc_record_begin_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( ix : nrow( new_historical_data ) ) , ]
   
   % before prepending new_historical_data in front of current_ohlc_record, need to give names to current_ohlc_record as
   % rbind needs to bind by named attributes
   names( current_ohlc_record ) = names( new_historical_data )
   
   % see https://stackoverflow.com/questions/11785710/rbind-function-changing-my-entries for reason for following
   % also need to coerce that dates in new_historical_data from POSIXct to character
   new_historical_data$TimeStamp = as.character( new_historical_data$TimeStamp )
   
   % and now prepend new_historical_data to current_ohlc_record
   combined_records = rbind( new_historical_data , current_ohlc_record , stringsAsFactors = FALSE )
   
   % and coerce character dates back to a POSIXct date format prior to printing
   combined_records$TimeStamp = as.POSIXct( combined_records$TimeStamp )
   
   % write combined_records to file
   write.table( combined_records , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , 
                col.names = FALSE , sep = "," )
   
   added_data_length = nrow( new_historical_data ) % length of added new data

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
   
   % write updated Instrument_update_file to file
   write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE )

} % end of for all_current_historical_data_list loop
As described in the previous post the function HisPricesDates is called to do the actual downloading, with the relevant dates for function input being read and calculated from the existing data file ( I have hard coded for hourly data but this can, of course, be changed or implemented as user input in the R session). As usual I have commented the script to explain what is going on.

However, one important caveat is that it is assumed that there is actually some Oanda data to download and prepend prior the the earliest date in the existing  record, and there are no checks of this assumption. Therefore, the script might fail in unexpected ways if one attempts to reach too far back in history for the prependable data.

Friday, 27 October 2017

Updating Historical Data Using Oanda's API and R

Following on from my previous post about downloading historical data, this post shows how previously downloaded data may be updated and appended with new, more recent data without having to re-download all the old data all over again.

The main function to do this, HisPricesDates, downloads data between given dates as function inputs and is shown below.
HisPricesDates  = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Start, End ){

% a typical Oanda API call might look like  
% https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD&granularity=D&start=2014-03-21&end=2014-04-21&candleFormat=midpoint&includeFirst=false
% which is slowly built up by using the R paste function, commented at end of each line below
  
  httpaccount  = "https://api-fxtrade.oanda.com"
  auth           = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec  = paste(httpaccount,"/v1/candles?instrument=",sep="") % https://api-fxtrade.oanda.com/v1/candles?instrument=
  QueryHistPrec1 = paste(QueryHistPrec,Instrument,sep="")              % https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD
  qstart = paste("start=",Start,sep="")                                % start=2014-03-21
  qend   = paste("end=",End,sep="")                                    % end=2014-04-21   
  qcandleFormat  = "candleFormat=midpoint"                             % candleFormat=midpoint
  qgranularity   = paste("granularity=",Granularity,sep="")            % granularity=D
  qdailyalignment    = paste("dailyAlignment=",DayAlign,sep="")        % dailyAlignment=0
  qincludeFirst = "includeFirst=false"                                 % includeFirst=false
  QueryHistPrec2 = paste(QueryHistPrec1,qgranularity,qstart,qend,qcandleFormat,qincludeFirst,qdailyalignment,sep="&")
  InstHistP = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices        = data.frame(InstHistPjson[[3]])
  Prices$time   = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}
This function is called by two R scripts, one for downloading daily data and one for intraday data.

The daily update script, which is shown next,
% cd to the daily data directory
setwd("~/Documents/octave/oanda_data/daily")

all_current_historical_data_list = read.table("instrument_daily_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {
  
  instrument = all_current_historical_data_list[ ii , 1 ]
  % read second column of dates in all_current_historical_data_list as a date index
  date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
  todays_date = as.Date( Sys.time() )
  
  % download the missing historical data from date_ix to todays_date, if and only if, date_ix != todays_date
  if( date_ix + 1 != todays_date ) {
  
  new_historical_data = HisPricesDates( Granularity = "D", DayAlign, TimeAlign, AccountToken, instrument,
                         date_ix , todays_date )

  % the new_historical_data might only try to add incomplete OHLC data, in which case do not actually
  % want to update, so only update if we will be adding new, complete OHLC information  
  if ( nrow( new_historical_data ) >= 2 & new_historical_data[ 2 , 7 ] == TRUE ) {
  
    % now do some data manipulation
    % expect date of last line in Instrument_update_file == date of first line in new_historical_data 
    if ( date_ix == as.Date( new_historical_data[ 1 , 1 ] ) ) { % this is the case if true
       new_historical_data = new_historical_data[ -1 , ]       % so delete first row of new_historical_data
    }
  
    % similarly, expect last line of new_historical_data to be an incomplete OHLC bar
    if ( new_historical_data[ nrow( new_historical_data) , 7 ] == FALSE) {         % if so,
       new_historical_data = new_historical_data[ -nrow( new_historical_data) , ] % delete this last line
    }
    
    % append new_historical_data to the relevant raw data file
    write.table( new_historical_data , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )
  
    added_data_length = nrow( new_historical_data )
    new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )
    
    % and amend Instrument_update file with lastest update information  
    all_current_historical_data_list[ ii , 2 ] = new_last_date
    all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
  
    } % end of download if statement
  
  } % end of ( date_ix != todays_date ) if statement
  
} % end of for all_current_historical_data_list loop

% Write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_daily_update_file" , row.names = FALSE , col.names = FALSE , na = "" )
has if statements as control structures to check that there is likely to be new daily data to actually download. It does this by checking a last_update date contained in an "instrument_daily_update_file" and comparing this with the current OS system time. If there is likely to be new data, the script runs and then updates this "instrument_daily_update_file." If not, the script exits with nothing having been done.

The intraday update script doe not have the checks the daily script has because I assume there will always be some new intraday data available for download. In this case, the last_update date is read from the "instrument_update_file" purely to act as an input to the above HisPricesDates function. As a result, this script involves some data manipulation to ensure that duplicate data is not printed to file. This script is shown next and is heavily commented to explain what is happening.
% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   % read second column of dates in all_current_historical_data_list as a date index
   date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
   
   todays_date = as.Date( Sys.time() )

   % download the missing historical data from date_ix to todays_date. If date_ix == todays_date, will download all
   % hourly bars for today only. 
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          date_ix , todays_date + 1 )

   % the new_historical_data will almost certainly have incomplete hourly OHLC data in its last line,
   % so delete this incomplete OHLC information
   if ( new_historical_data[ nrow( new_historical_data ) , 7 ] == FALSE ) {
        new_historical_data = new_historical_data[ -nrow( new_historical_data ) , ]
   }
   
   % read the last line only of the current OHLC file for this instrument
   file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) % get the filename
   
   system_command = paste( "tail -1" , file , sep = " " )     % create a unix system command to read the last line of this file
   
   % read the file's last line
   old_historical_data = read.csv( textConnection( system( system_command , intern = TRUE ) ) , header = FALSE , sep = "," ,
                                    stringsAsFactors = FALSE )
   
   old_historical_data_end_date_time = old_historical_data[ 1 , 1 ]            % get the date value to be matched 
   
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value

   ix = charmatch( old_historical_data_end_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( 1 : ix ) , ]
   
   % append new_historical_data to the relevant raw data file
   write.table( new_historical_data , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )

   added_data_length = nrow( new_historical_data )                          % length of added new data
   new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )  % date of last update

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 2 ] = new_last_date
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length

} % end of for all_current_historical_data_list loop

% finally, write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE , na = "" )
There is one important thing to point out on lines 29 to 33, which is that this section of code relies on a Unix based command, which in turn means that this almost certainly will not work on Windows based OSes. Windows users will have to find their own hack to load just the last line of the relevant file, or put up with loading the whole historical data file and indexing just the last line.

Thursday, 21 September 2017

Downloading Historical Data Using Oanda's API and R

It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data.

Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position.

My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio.

The first such R script is called to initialise the variables and load the various functions into the R environment, shown below.
# Required Packages in order to use the R API functions
library("downloader")
library("RCurl")
library("jsonlite")
library("httr")

# -- ---------------------------------------------------------------------------------- #
# -- Specific Values of Parameters in API for my primary account ---------------------- #
# -- ---------------------------------------------------------------------------------- #

# -- numeric ------ # My Account ID 
AccountID = 123456     
# -- character ---- # My Oanda Token
AccountToken = "blah-de-blah-de-blah"

# load the various function files
source("~/path/to/InstrumentsList.R")
source("~/path/to/R_Oanda_API_functions/ActualPrice.R")
source("~/path/to/HisPricesCount.R")
source("~/path/to/HisPricesDates.R")

# get list of all tradeable instruments
trade_instruments = InstrumentsList( AccountToken , AccountID )
View( trade_instruments )

# load some default values
# -- character ---- # Granularity of the prices
# granularity: The time range represented by each candlestick. The value specified will determine the alignment 
# of the first candlestick.
# choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month )
Granularity = "D"

# -- numeric ------ # Hour of the "End of the Day"
# dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. 
# The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be 
# an integer between 0 and 23.
DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York"

# -- character ---- # Time Zone in format "Continent/Zone
# alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the 
# returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by 
# the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at 
# http://developer.oanda.com/docs/timezones.txt
# "America%2FMexico_City" was the originallly provided, but could use, for example,  "Europe%2FLondon" or "Europe%2FWarsaw"
TimeAlign = "Europe%2FLondon"

################################# IMPORTANT NOTE #####################################################################
# By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures
# that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is 
# Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer
# daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT.
######################################################################################################################

Start = Sys.time() # Current system time
End = Sys.time() # Current system time
Count = 500 # Oanda default

# now cd to the working directory
setwd("~/path/to/oanda_data")
The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next.
InstrumentsList = function( AccountToken , AccountID )
{
  httpaccount = "https://api-fxtrade.oanda.com"
  auth        = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  Queryhttp   = paste(httpaccount,"/v1/instruments?accountId=",sep="")
  QueryInst   = paste(Queryhttp,AccountID,sep="")
  QueryInst1  = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstJson    = fromJSON(QueryInst1, simplifyDataFrame = TRUE)
  FinalData   = data.frame(InstJson)
  colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits")
  FinalData$MaxTradeUnits = as.numeric(FinalData$MaxTradeUnits)
  return(FinalData)
}
This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also.
# cd to the working directory
setwd("~/path/to/oanda_data")

# dataframe to keep track of updates
Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE )

for( ii in 1 : nrow( trade_instruments ) ) {
  
  instrument = trade_instruments[ ii , 1 ]
  
  # write details of instrument to Instrument_update_file
  Instrument_update_file[ ii , 1 ] = instrument
  
  historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 )
  last_row_ix = nrow( historical_prices )
  
  if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete
    # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data
    # to Instrument_update_file
    Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] )
    
    # and delete the row with these incomplete values
    historical_prices = historical_prices[ 1 : last_row_ix - 1 , ]
    
  } # end of if statement
  
  # Write historical_prices to file
  write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , 
             col.names = FALSE , sep = "," )

  } # end of for loop

  # Write Instrument_update_file to file
  write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," )
This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is
HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){
  
  httpaccount      = "https://api-fxtrade.oanda.com"
  auth             = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec    = paste(httpaccount,"/v1/candles?instrument=",sep="")
  QueryHistPrec1   = paste(QueryHistPrec,Instrument,sep="")
  qcount           = paste("count=",Count,sep="")
  qcandleFormat    = "candleFormat=midpoint"
  qgranularity     = paste("granularity=",Granularity,sep="")
  qdailyalignment  = paste("dailyAlignment=",DayAlign,sep="")
  QueryHistPrec2   = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&")
  InstHistP        = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson    = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices           = data.frame(InstHistPjson[[3]])
  Prices$time      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}
One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.

Thursday, 20 April 2017

Using the BayesOpt Library to Optimise my Planned Neural Net

Following on from my last post, I have recently been using the BayesOpt library to optimise my planned neural net, and this post is a brief outline, with code, of what I have been doing.

My intent was to design a Nonlinear autoregressive exogenous model using my currency strength indicator as the main exogenous input, along with other features derived from the use of Savitzky-Golay filter convolution to model velocity, acceleration etc. I decided that rather than model prices directly, I would model the 20 period simple moving average because it would seem reasonable to assume that modelling a smooth function would be easier, and from this average it is a trivial matter to reverse engineer to get to the underlying price.

Given that my projected feature space/lookback length/number of nodes combination is/was a triple digit, discrete dimensional problem, I used the "bayesoptdisc" function from the BayesOpt library to perform a discrete Bayesian optimisation over these parameters, the main Octave script for this being shown below.
clear all ;

% load the data
% load eurusd_daily_bin_bars ;
% load gbpusd_daily_bin_bars ;
% load usdchf_daily_bin_bars ;
load usdjpy_daily_bin_bars ;
load all_rel_strengths_non_smooth ;
% all_rel_strengths_non_smooth = [ usd_rel_strength_non_smooth eur_rel_strength_non_smooth gbp_rel_strength_non_smooth chf_rel_strength_non_smooth ...
% jpy_rel_strength_non_smooth aud_rel_strength_non_smooth cad_rel_strength_non_smooth ] ;

% extract relevant data
% price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice 
% price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; % midprice  
price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; % midprice
base_strength = all_rel_strengths_non_smooth( : , 1 ) .- 0.5 ;
term_strength = all_rel_strengths_non_smooth( : , 5 ) .- 0.5 ; 

% clear unwanted data
% clear eurusd_daily_bars all_rel_strengths_non_smooth ;
% clear gbpusd_daily_bars all_rel_strengths_non_smooth ;
% clear usdchf_daily_bars all_rel_strengths_non_smooth ;
clear usdjpy_daily_bars all_rel_strengths_non_smooth ;

global start_opt_line_no = 200 ;
global stop_opt_line_no = 7545 ;

% get matrix coeffs
slope_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 1 ) ;
accel_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 2 ) ;
jerk_coeffs = generalised_sgolay_filter_coeffs( 5 , 3 , 3 ) ;

% create features
sma20 = sma( price , 20 ) ;
global targets = sma20 ;
[ sma_max , sma_min ] = adjustable_lookback_max_min( sma20 , 20 ) ;
global sma20r = zeros( size(sma20,1) , 5 ) ;
global sma20slope = zeros( size(sma20,1) , 5 ) ;
global sma20accel = zeros( size(sma20,1) , 5 ) ;
global sma20jerk = zeros( size(sma20,1) , 5 ) ;

global sma20diffs = zeros( size(sma20,1) , 5 ) ;
global sma20diffslope = zeros( size(sma20,1) , 5 ) ;
global sma20diffaccel = zeros( size(sma20,1) , 5 ) ;
global sma20diffjerk = zeros( size(sma20,1) , 5 ) ;

global base_strength_f = zeros( size(sma20,1) , 5 ) ;
global term_strength_f = zeros( size(sma20,1) , 5 ) ;

base_term_osc = base_strength .- term_strength ;
global base_term_osc_f = zeros( size(sma20,1) , 5 ) ;
slope_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
accel_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
jerk_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_bt_osc_f = zeros( size(sma20,1) , 5 ) ;

slope_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_base_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_base_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_base_strength_f = zeros( size(sma20,1) , 5 ) ;

slope_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_term_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_term_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_term_strength_f = zeros( size(sma20,1) , 5 ) ;

min_max_range = sma_max .- sma_min ;

for ii = 51 : size( sma20 , 1 ) - 1 % one step ahead is target
  
  targets(ii) = 2 * ( ( sma20(ii+1) - sma_min(ii) ) / min_max_range(ii) - 0.5 ) ;
  
  % scaled sma20
  sma20r(ii,:) = 2 .* ( ( flipud( sma20(ii-4:ii,1) )' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ;
  sma20slope(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * slope_coeffs ) ;
  sma20accel(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * accel_coeffs ) ;
  sma20jerk(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * jerk_coeffs ) ;
  
  % scaled diffs of sma20
  sma20diffs(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' ) ;
  sma20diffslope(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * slope_coeffs ) ;
  sma20diffaccel(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * accel_coeffs ) ;
  sma20diffjerk(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * jerk_coeffs ) ;
  
  % base strength
  base_strength_f(ii,:) = fliplr( base_strength(ii-4:ii)' ) ;
  slope_base_strength_f(ii,:) = fliplr( slope_base_strength(ii-4:ii)' ) ;
  accel_base_strength_f(ii,:) = fliplr( accel_base_strength(ii-4:ii)' ) ;
  jerk_base_strength_f(ii,:) = fliplr( jerk_base_strength(ii-4:ii)' ) ;
  
  % term strength
  term_strength_f(ii,:) = fliplr( term_strength(ii-4:ii)' ) ;
  slope_term_strength_f(ii,:) = fliplr( slope_term_strength(ii-4:ii)' ) ;
  accel_term_strength_f(ii,:) = fliplr( accel_term_strength(ii-4:ii)' ) ;
  jerk_term_strength_f(ii,:) = fliplr( jerk_term_strength(ii-4:ii)' ) ;
  
  % base term oscillator
  base_term_osc_f(ii,:) = fliplr( base_term_osc(ii-4:ii)' ) ;
  slope_bt_osc_f(ii,:) = fliplr( slope_bt_osc(ii-4:ii)' ) ;
  accel_bt_osc_f(ii,:) = fliplr( accel_bt_osc(ii-4:ii)' ) ;
  jerk_bt_osc_f(ii,:) = fliplr( jerk_bt_osc(ii-4:ii)' ) ;
  
endfor

% create xset for bayes routine
% raw indicator
xset = zeros( 4 , 5 ) ; xset( 1 , : ) = 1 : 5 ;
% add the slopes
to_add = zeros( 4 , 15 ) ; 
to_add( 1 , : ) = [ 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ; 
xset = [ xset to_add ] ; 
% add accels
to_add = zeros( 4 , 21 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 1 2 2 3 3 3 1 2 3 4 2 3 3 4 4 4 4 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 1 1 2 1 2 3 1 1 1 1 2 2 3 1 2 3 4 ] ;
xset = [ xset to_add ] ; 
% add jerks
to_add = zeros( 4 , 70 ) ; 
to_add( 1 , : ) = [ 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 2 , : ) = [ 1 1 2 2 2 1 2 2 2 3 3 3 3 3 3 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ; 
to_add( 3 , : ) = [ 1 1 1 2 2 1 1 2 2 1 2 2 3 3 3 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 4 , : ) = [ 1 1 1 1 2 1 1 1 2 1 1 2 1 2 3 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
% construct all_xset for combinations of indicators and look back lengths
all_zeros = zeros( size( xset ) ) ;
all_xset = [ xset ; repmat( all_zeros , 3 , 1 ) ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; xset ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; xset ] ] ;
all_xset = [ all_xset repmat( xset , 4 , 1 ) ] ;

ones_all_xset = ones( 1 , size( all_xset , 2 ) ) ;

% now add layer for number of neurons and extend as necessary
max_number_of_neurons_in_layer = 20 ;

parameter_matrix = [] ;

for ii = 2 : max_number_of_neurons_in_layer % min no. of neurons is 2, max = max_number_of_neurons_in_layer  
  parameter_matrix = [ parameter_matrix [ ii .* ones_all_xset ; all_xset ] ] ;
endfor

% now the actual bayes optimisation routine
% set the parameters
params.n_iterations = 190; % bayesopt library default is 190
params.n_init_samples = 10;
params.crit_name = 'cEIa'; % cEI is default. cEIa is an annealed version
params.surr_name = 'sStudentTProcessNIG';
params.noise = 1e-6;
params.kernel_name = 'kMaternARD5';
params.kernel_hp_mean = [1];
params.kernel_hp_std = [10];
params.verbose_level = 1; % 3 to path below
params.log_filename = '/home/dekalog/Documents/octave/cplusplus.oct_functions/nn_functions/optimise_narx_ind_lookback_nodes_log';
params.l_type = 'L_MCMC' ; % L_EMPIRICAL is default
params.epsilon = 0.5 ; % probability of performing a random (blind) evaluation of the target function. 
% Higher values implies forced exploration while lower values relies more on the exploration/exploitation policy of the criterion. 0 is default

% the function to optimise
fun = 'optimise_narx_ind_lookback_nodes_rolling' ;

% the call to the Bayesopt library function
bayesoptdisc( fun , parameter_matrix , params ) ; 
% result is the minimum as a vector (x_out) and the value of the function at the minimum (y_out)
What this script basically does is:
  1. load all the relevant data ( in this case a forex pair )
  2. creates a set of scaled features
  3. creates a necessary parameter matrix for the discrete optimisation function
  4. sets the parameters for the optimisation routine
  5. and finally calls the "bayesoptdisc" function
Note that in step 2 all the features are declared as global variables, this being necessary because the "bayesoptdisc" function of the BayesOpt library does not appear to admit passing these variables as inputs to the function.

The actual function to be optimised is given in the following code box, and is basically a looped neural net training routine.
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} optimise_narx_ind_lookback_nodes_rolling (@var{input1})
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-03-21

function [ retval ] = optimise_narx_ind_lookback_nodes_rolling( input1 )
 
% declare all the global variables so the function can "see" them
global start_opt_line_no ;
global stop_opt_line_no ; 
global targets ;
global sma20r ; % 2
global sma20slope ;
global sma20accel ;
global sma20jerk ;
global sma20diffs ;
global sma20diffslope ;
global sma20diffaccel ;
global sma20diffjerk ;
global base_strength_f ;
global slope_base_strength_f ;
global accel_base_strength_f ;
global jerk_base_strength_f ;
global term_strength_f ;
global slope_term_strength_f ;
global accel_term_strength_f ;
global jerk_term_strength_f ;
global base_term_osc_f ;
global slope_bt_osc_f ;
global accel_bt_osc_f ;
global jerk_bt_osc_f ;

% build feature matrix from the above global variable according to parameters in input1
hidden_layer_size = input1(1) ;

% training targets
Y = targets( start_opt_line_no:stop_opt_line_no , 1 ) ;

% create empty feature matrix
X = [] ;

% which will always have at least one element of the main price series for the NARX
X = [ X sma20r( start_opt_line_no:stop_opt_line_no , 1:input1(2) ) ] ;

% go through input1 values in turn and add to X if necessary
if input1(3) > 0
X = [ X sma20slope( start_opt_line_no:stop_opt_line_no , 1:input1(3) ) ] ; 
endif

if input1(4) > 0
X = [ X sma20accel( start_opt_line_no:stop_opt_line_no , 1:input1(4) ) ] ; 
endif

if input1(5) > 0
X = [ X sma20jerk( start_opt_line_no:stop_opt_line_no , 1:input1(5) ) ] ; 
endif

if input1(6) > 0
X = [ X sma20diffs( start_opt_line_no:stop_opt_line_no , 1:input1(6) ) ] ; 
endif

if input1(7) > 0
X = [ X sma20diffslope( start_opt_line_no:stop_opt_line_no , 1:input1(7) ) ] ; 
endif

if input1(8) > 0
X = [ X sma20diffaccel( start_opt_line_no:stop_opt_line_no , 1:input1(8) ) ] ; 
endif

if input1(9) > 0
X = [ X sma20diffjerk( start_opt_line_no:stop_opt_line_no , 1:input1(9) ) ] ; 
endif

if input1(10) > 0 % input for base and term strengths together
X = [ X base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
X = [ X term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ; 
endif

if input1(11) > 0
X = [ X slope_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
X = [ X slope_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ; 
endif

if input1(12) > 0
X = [ X accel_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
X = [ X accel_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ; 
endif

if input1(13) > 0
X = [ X jerk_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
X = [ X jerk_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ; 
endif

if input1(14) > 0
X = [ X base_term_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(14) ) ] ; 
endif

if input1(15) > 0
X = [ X slope_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(15) ) ] ; 
endif

if input1(16) > 0
X = [ X accel_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(16) ) ] ; 
endif

if input1(17) > 0
X = [ X jerk_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(17) ) ] ; 
endif

% now the X features matrix has been formed, get its size
X_rows = size( X , 1 ) ; X_cols = size( X , 2 ) ;

X = [ ones( X_rows , 1 ) X ] ; % add bias unit to X

fan_in = X_cols + 1 ; % no. of inputs to a node/unit, including bias
fan_out = 1 ; % no. of outputs from node/unit
r = sqrt( 6 / ( fan_in + fan_out ) ) ; 

rolling_window_length = 100 ;
n_iters = 100 ;
n_iter_errors = zeros( n_iters , 1 ) ;
all_errors = zeros( X_rows - ( rolling_window_length - 1 ) - 1 , 1 ) ;
rolling_window_loop_iter = 0 ;

for rolling_window_loop = rolling_window_length : X_rows - 1 
 
  rolling_window_loop_iter = rolling_window_loop_iter + 1 ;
  
    % train n_iters no. of nets and put the error stats in n_iter_errors
    for ii = 1 : n_iters
      
      % initialise weights
      % see https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
      
      % One option is Orthogonal random matrix initialization for input_to_hidden weights
      % w_i = rand( X_cols + 1 , hidden_layer_size ) ;
      % [ u , s , v ] = svd( w_i ) ;
      % input_to_hidden = [ ones( X_rows , 1 ) X ] * u ; % adding bias unit to X
      
      % using fan_in and fan_out for tanh
      w_i = ( rand( X_cols + 1 , hidden_layer_size ) .* ( 2 * r ) ) .- r ;
      input_to_hidden = X( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , : ) * w_i ; 
      
      % push the input_to_hidden through the chosen sigmoid function
      hidden_layer_output = sigmoid_lecun_m( input_to_hidden ) ;
      
      % add bias unit for the output from hidden
      hidden_layer_output = [ ones( rolling_window_length , 1 ) hidden_layer_output ] ;
      
      % use hidden_layer_output as the input to a linear regression fit to targets Y
      % a la Extreme Learning Machine
      % w = ( inv( X' * X ) * X' ) * y ; the "classic" way for linear regression, where
      % X = hidden_layer_output, but
      w = ( ( hidden_layer_output' * hidden_layer_output ) \ hidden_layer_output' ) * Y( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , 1 ) ;
      % is quicker and recommended
      
      % use these current values of w_i and w for out of sample test
      os_input_to_hidden = X( rolling_window_loop + 1 , : ) * w_i ;
      os_hidden_layer_output = sigmoid_lecun_m( os_input_to_hidden ) ;
      os_hidden_layer_output = [ 1 os_hidden_layer_output ] ; % add bias
      os_output = os_hidden_layer_output * w ;
      n_iter_errors( n_iters ) = abs( Y( rolling_window_loop + 1 , 1 ) - os_output ) ;
      
    endfor
    
    all_errors( rolling_window_loop_iter ) = mean( n_iter_errors ) ;
    
endfor % rolling_window_loop

retval = mean( all_errors ) ;

clear X w_i ; 

endfunction
However, to speed things up for some rapid prototyping, rather than use backpropagation training this function uses the principles of an extreme learning machine and loops over 100 such trained ELMs per set of features contained in a rolling window of length 100 across the entire training data set. Walk forward cross validation is performed for each of the 100 ELMs, an average of the out of sample error obtained, and these averages across the whole data set are then averaged to provide the function return. The code was run on daily bars of the four major forex pairs; EURUSD, GBPUSD, USDCHF and USDYPY.

The results of running the above are quite interesting. The first surprise is that the currency strength indicator and features derived from it were not included in the optimal model for any of the four tested pairs. Secondly, for all pairs, a scaled version of a 20 bar price momentum function, and derived features, was included in the optimal model. Finally, again for all pairs, there was a symmetrically decreasing lookback period across the selected features, and when averaged across all pairs the following pattern results: 10 3 3 2 1 3 3 2 1, which is to be read as:
  • 10 nodes (plus a bias node) in the hidden layer
  • lookback length of 3 for the scaled values of the SMA20 and the 20 bar scaled momentum function
  • lookback length of 3 for the slopes/rates of change of the above
  • lookback length of 2 for the "accelerations" of the above
  • lookback length of 1 for the "jerks" of the above 
So it would seem that the 20 bar momentum function is a better exogenous input than the currency strength indicator. The symmetry across features is quite pleasing, and the selection of these "physical motion" features across all the tested pairs tends to confirm their validity. The fact that the currency strength indicator was not selected does not mean that this indicator is of no value, but perhaps it should not be used for regression purposes, but rather as a filter. More in due course.

Tuesday, 7 February 2017

Update on Currency Strength Smoothing, and a new direction?

Since my last two posts ( currency strength indicator and preliminary tests thereof ) I have been experimenting with different ways of smoothing the indicators without introducing lag, mostly along the lines of using an oscillator leading signal plus various schemes to smooth and compensate for introduced attenuation and making heavy use of my particle swarm optimisation code. Unfortunately I haven't found anything that really works to my satisfaction and so I have decided to forgo any further attempts at this and just use the indicator in its unsmoothed form as neural net input.

In the process of doing the above work I decided that my particle swarm routine wasn't fast enough and I started using the BayesOpt optimisation library, which is written in C++ and has an interface to Octave. Doing this has greatly decreased the time I've had to spend in my various optimisation routines and the framework provided by the BayesOpt library will enable more ambitious optimisations in the future.

Another discovery for me was this Predicting Stock Market Prices with Physical Laws paper, which has some really useful ideas for neural net input features. In particular I think the idea of combining position, velocity and acceleration with the ideas contained in an earlier post of mine on Savitzky Golay filter convolution and using the currency strength indicators as proxies for the arbitrary sine and cosine waves function posited in the paper hold some promise. More in due course.