Monday, 11 December 2017

Time Warp Edit Distance

Part of my normal routine is to indulge in online research for use useful ideas, and I recently came across An Empirical Evaluation of Similarity Measures for Time Series Classification, and one standout from this paper is the Time Warp Edit Distance where, from the conclusion, "...the TWED measure originally proposed by Marteau (2009) seems to consistently outperform all the considered distances..."

Below is my Octave .oct function version of the above linked MATLAB code.
#include octave oct.h
#include octave dmatrix.h
#include limits> // for infinity
#include math.h  // for sqrt

DEFUN_DLD ( twed, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} twed (@var{A , timeSA , B , timeSB , lambda, nu})\n\
Calculates the Time Warp Edit Distance between two univariate time series, A and B.\n\
timeSA and timeSB are the time stamps of the respective series, lambda is a penalty\n\
for a deletion operation and nu is an Elasticity parameter - nu >=0 needed for distance measure.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin != 6 )
   {
   error ("Invalid number of arguments. See help twed.") ;
   return retval_list ;
   }

if ( args(0).length () < 2 )
   {
   error ("Invalid 1st argument length. Must be >= 2.") ;
   return retval_list ;
   }
   
if ( args(1).length () != args(0).length () )
   {
   error ("Arguments 1 and 2 must be vectors of the same length.") ;
   return retval_list ;
   }
   
if ( args(2).length () < 2 )
   {
   error ("Invalid 3rd argument length. Must be >= 2.") ;
   return retval_list ;
   }
   
if ( args(3).length () != args(2).length () )
   {
   error ("Arguments 3 and 4 must be vectors of the same length.") ;
   return retval_list ;
   }   
   
if ( args(4).length () > 1 )
   {
   error ("Argument 5 must a single value for lambda.") ;
   return retval_list ;
   }  
  
if ( args(5).length () > 1 )
   {
   error ("Argument 6 must a single value for nu >= 0.") ;
   return retval_list ;
   }   

if ( error_state )
   {
   error ("Invalid arguments. See help twed.") ;
   return retval_list ;
   }
// end of input checking  
  
Matrix A_input = args(0).matrix_value () ;
   if( A_input.rows() == 1 && A_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    A_input = A_input.transpose () ; 
   }
   
Matrix timeSA_input = args(1).matrix_value () ;
   if( timeSA_input.rows() == 1 && timeSA_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    timeSA_input = timeSA_input.transpose () ; 
   }
   
Matrix B_input = args(2).matrix_value () ;
   if( B_input.rows() == 1 && B_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    B_input = B_input.transpose () ; 
   }
   
Matrix timeSB_input = args(3).matrix_value () ;
   if( timeSB_input.rows() == 1 && timeSB_input.cols() >= 2 ) // is a row matrix, so transpose 
   {
    timeSB_input = timeSB_input.transpose () ; 
   }
   
double lambda = args(4).double_value () ;
double nu = args(5).double_value () ;
double inf = std::numeric_limits::infinity() ;
Matrix distance ( 1 , 1 ) ; distance.fill ( 0.0 ) ;
double cost ;
  
// Add padding of zero by using zero-filled distance matrix
Matrix A = distance.stack( A_input ) ;
Matrix timeSA = distance.stack( timeSA_input ) ;
Matrix B = distance.stack( B_input ) ;
Matrix timeSB = distance.stack( timeSB_input ) ;

Matrix DP ( A.rows() , B.rows() ) ; DP.fill ( inf ) ; DP( 0 , 0 ) = 0.0 ; 
int n = timeSA.rows () ;
int m = timeSB.rows () ;

    // Compute minimal cost
    for ( octave_idx_type ii (1) ; ii < n ; ii++ )
    {
      
        for ( octave_idx_type jj (1) ; jj < m ; jj++ )
        {
          
        // Deletion in A
        DP( ii , jj ) = DP(ii-1,jj) +  sqrt( ( A(ii-1,0) - A(ii,0) ) * ( A(ii-1,0) - A(ii,0) ) ) + nu * ( timeSA(ii,0) - timeSA(ii-1,0) ) + lambda ;
    
        // Deletion in B
        cost = DP(ii,jj-1) + sqrt( ( B(jj-1,0) - B(jj,0) ) * ( B(jj-1,0) - B(jj,0) ) ) + nu * ( timeSB(jj,0) - timeSB(jj-1,0) ) + lambda ;
        DP( ii , jj ) = cost < DP( ii , jj ) ? cost : DP( ii , jj ) ;
    
        // Keep data points in both time series
        cost = DP(ii-1,jj-1) + sqrt( ( A(ii,0) - B(jj,0) ) * ( A(ii,0) - B(jj,0) ) ) + sqrt( ( A(ii-1,0) - B(jj-1,0) ) * ( A(ii-1,0) - B(jj-1,0) ) ) + nu * ( abs( timeSA(ii,0) - timeSB(jj,0) ) + abs( timeSA(ii-1,0) - timeSB(jj-1,0) ) ) ;
        DP( ii , jj ) = cost < DP( ii , jj ) ? cost : DP( ii , jj ) ;

        } // end of jj loop
        
      } // end of ii loop

distance( 0 , 0 ) = DP( n - 1 , m - 1 ) ;
      
retval_list(1) = DP ; 
retval_list(0) = distance ;

return retval_list ;

} // end of function
As a quick test I took the example problem from this Cross Validated thread, the applicability I hope being quite obvious to readers:

A = [1, 2, 3, 4, 5, 6, 7, 8, 9] ;
B1 = [1, 2, 3, 4, 5, 6, 7, 8, 12] ;
distance1 = twed( A , 1:9 , B1 , 1:9 , 1 , 0.001 )
distance1 =  3
B2 = [0, 3, 2, 5, 4, 7, 6, 9, 8] ;
distance2 = twed( A , 1:9 , B2 , 1:9 , 1 , 0.001 )
distance2 =  17
graphics_toolkit('fltk') ; plot(A,'k','linewidth',2,B1,'b','linewidth',2,B2,'r','linewidth',2);
legend( "A" , "B1" , "B2" ) ;
It can be seen that the twed algorithm correctly picks out B1 as being more like A than B2 (a lower twed distance, with default values for lambda and nu of 1 and 0.001 respectively, taken from the above survey paper) when compared with the simple squared error metric, which gives identical results for both B1 and B2.

More on this in due course.

Friday, 8 December 2017

candle.m Function Accepted

I have received an e-mail from the maintainer of the Octave Financial Package that my candlestick function will be rolled out with the next release of the financial package. For those readers who can't wait the final code, with revisions, is now available at the Octave-Forge here.

This represents a new milestone for me as this is my first, officially accepted contribution to any free and open source software (FOSS) project.

Tuesday, 5 December 2017

Candlestick Plotting Function Submitted for Inclusion in Octave Financial Package

I have today submitted an improved version of my basic candlestick plotting function, candle.m ( see previous post ) for inclusion in the Octave Financial package. As I am not sure when, or even if, it will be accepted, I provide a copy of it below.
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*- 
## @deftypefn {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color}, @var{dates})
## @deftypefnx {Function File} {@var{retval} =} candle (@var{highprices}, @var{lowprices}, @var{closeprices}, @var{openprices}, @var{color}, @var{dates}, @var{dateform})
##
## Plot the @var{highprices}, @var{lowprices}, @var{closeprices} and @var{openprices} of a security as a candlestick chart.
##
## HighPrices - High prices for a security. A column vector.
##
## LowPrices - Low prices for a security. A column vector.
##
## ClosePrices - Close prices for a security. A column vector.
##
## OpenPrices - Open prices for a security. A column vector.
##
## Color - (optional) Candlestick color is specified as a case insensitive four 
## character row vector, e.g. "brwk". The characters that are accepted are 
## k, b, c, r, m, w, g and y for black, blue, cyan, red, magenta, white, green 
## and yellow respectively. Default colors are "brwk" applied in order to bars 
## where the closing price is greater than the opening price, bars where the 
## closing price is less than the opening price, the chart background color and
## the candlestick wicks. If fewer than four colors are specified, they are
## applied in turn in the above order with default colors for unspecified colors. 
## For example, user supplied colors "gm" will plot green upbars and magenta
## downbars with a default white background and black wicks. If the user 
## specified color for background is black, without specifying the wick color, 
## e.g. "gmk", the default wick color is white. All other choices for background 
## color will default to black for wicks. If all four colors are user specified,
## those colors will be used. Doji bars and single price bars, e.g. open = high 
## = low = close, are plotted with the color for wicks, with single price bars 
## being plotted as points/dots.
##
## Dates - (Optional) Dates for user specified x-axis tick labels. Dates can be 
## a serial date number column (see datenum), a datevec matrix (See datevec)
## or a character vector of dates. If specified as either a datenum or a datevec,
## the Dateform argument is required. If the Dates argument is supplied, the 
## Color argument must also be explicitly specified.
##
## Dateform - (Optional) Either a Date character string or a single integer code
## number used to format the x-axis tick labels (See datestr). Only required if 
## Dates is specified as a serial date number column (See datenum) or a datevec 
## matrix (See datevec). 
##
## @seealso{datenum, datestr, datevec, highlow, bolling, dateaxis, movavg, 
## pointfig}
## @end deftypefn

## Author: dekalog 
## Created: 2017-12-05

function candle ( varargin )  
  
if ( nargin < 4 || nargin > 7 )
  print_usage ();
elseif ( nargin == 4 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = "brwk";
elseif ( nargin == 5 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5};
elseif ( nargin == 6 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5}; dates = varargin{6};
elseif ( nargin == 7 )
  HighPrices = varargin{1}; LowPrices = varargin{2}; ClosePrices = varargin{3}; 
  OpenPrices = varargin{4}; color = varargin{5}; dates = varargin{6}; 
  dateform = varargin{7};
endif
   
if ( ! isnumeric ( HighPrices ) || ! isnumeric ( LowPrices ) || ...
  ! isnumeric ( ClosePrices ) || ! isnumeric ( OpenPrices ) ) # one of OHLC is not numeric
  error ("candle: The inputs for HighPrices, LowPrices, ClosePrices and OpenPrices must be numeric vectors.");  
endif

if ( size ( OpenPrices ) != size ( HighPrices ) )
  error ("candle: OpenPrices and HighPrices vectors are different lengths.");
endif

if ( size ( HighPrices ) != size ( LowPrices ) )
  error ("candle: HighPrices and LowPrices vectors are different lengths.");
endif 

if ( size ( LowPrices ) != size ( ClosePrices ) )
  error ("candle: LowPrices and ClosePrices vectors are different lengths.");
endif

if ( size ( ClosePrices, 1 ) == 1 && size ( ClosePrices, 2 ) > 1 ) # ohlc inputs are row vectors, so transpose them
  OpenPrices = OpenPrices'; HighPrices = HighPrices'; LowPrices = LowPrices'; 
  ClosePrices = ClosePrices';
  warning ("candle: The HighPrices, LowPrices, ClosePrices and OpenPrices should be column vectors. They have been transposed.");
endif
 
## check the user input Color argument, if it's character row vector
if ( ( nargin >= 5 && ischar ( color ) ) && size ( color, 1 ) == 1 )
  
  if ( size ( color, 2 ) == 1 )                      # only one color has been user specified
    color = [ tolower( color ) "rwk" ];              # so add default colors for down bars, background and wicks
  
  elseif ( size ( color, 2 ) == 2 )                  # two colors have been user specified
    color = [ tolower( color ) "wk" ];               # so add default colors for background and wicks
  
  elseif ( size ( color, 2 ) == 3 )                  # three colors have been user specified
  
    if ( color ( 3 ) == "k" || color ( 3 ) == "K" )  # if user selected background is black
     color = [ tolower( color ) "w" ];               # set wicks to default white
 
    else
     color = [ tolower( color ) "k" ];               # else default black wicks
 
    endif
  
  elseif ( size ( color, 2 ) >= 4 )                  # all four colors have been user specified, extra character inputs ignored  
  color = tolower( color );                          # correct in case user input contains upper case e.g. "BRWK" 
  
  endif 

elseif ( nargin >= 5 && ! ischar ( color ) )          # the user input for color is not a charcter vector

warning ("candle: The fifth input argument, Color, should be a character row vector for Color.\nThe chart has been plotted with default colors.");
color = "brwk"; 

elseif ( ischar ( color ) && size ( color, 1 ) != 1 ) # user input is more than one row of characters - a date character array by mistake?

warning ("candle: Color is not a single row character vector. Possibly a column Dates character vector?\nThe chart has been plotted with default colors.");
color = "brwk";

endif                                                 # end of nargin >= 5 && ischar ( color ) ) && size ( color, 1 ) == 1 if statement
  
wicks = HighPrices .- LowPrices;
body = ClosePrices .- OpenPrices;
up_down = sign ( body );
body_width = 20;
wick_width = 1;
doji_size = 10;
one_price_size = 15;

hold on;

## first, plot the chart background color
plot ( HighPrices, color( 3 ), LowPrices, color( 3 ) );
fill ( [ min( xlim ) max( xlim ) max( xlim ) min( xlim ) ], ...
       [ min( ylim ) min( ylim ) max( ylim ) max( ylim ) ], color( 3 ) );

## plot the wicks
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = x;
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = HighPrices;         # highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = LowPrices;             # lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1 , length ( HighPrices ) ) ], ...
              [] , 1 );
plot ( x, y, color( 4 ), "linewidth", wick_width );                           # plot wicks

## plot the up bar bodies 
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = ( up_down == 1 ); idx = find ( idx );                                   # index by condition close > open
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = ClosePrices( idx ); # body highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = OpenPrices( idx );     # body lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1, length ( HighPrices ) ) ], ...
              [], 1 );
plot ( x, y, color( 1 ), "linewidth", body_width );                           # plot bodies for up bars
  
## plot the down bar bodies
x = ( 1 : length ( ClosePrices ) );                                           # the x-axis
idx = ( up_down == -1 ); idx = find ( idx );                                  # index by condition close < open
high_nan = nan ( size ( HighPrices ) ); high_nan( idx ) = OpenPrices( idx );  # body highs
low_nan = nan ( size ( LowPrices ) ); low_nan( idx ) = ClosePrices( idx );    # body lows
x = reshape ( [ x; x; nan( size ( x ) ) ], [], 1 );
y = reshape ( [ high_nan(:)'; low_nan(:)'; nan( 1, length ( HighPrices ) ) ], ...
              [], 1 );
plot ( x, y, color( 2 ), "linewidth", body_width );                           # plot bodies for down bars

## plot special cases
## doji bars
doji_bar = ( HighPrices > LowPrices ) .* ( ClosePrices == OpenPrices ); ...
             doji_ix = find ( doji_bar ); 
             
if ( length ( doji_ix ) >= 1 )
  x = ( 1 : length ( ClosePrices ) );                                         # the x-axis  
  plot ( x( doji_ix ), ClosePrices( doji_ix ), [ "+" char ( color( 4 ) ) ], ...
        "markersize", doji_size );                                            # plot the open/close as horizontal dash  
endif

## OpenPrices == HighPrices == LowPrices == ClosePrices
one_price = ( HighPrices == LowPrices ) .* ( ClosePrices == OpenPrices ) .* ...
            ( OpenPrices == HighPrices ); one_price_ix = find ( one_price ); 
            
if ( length ( one_price_ix ) >= 1 )
  x = ( 1 : length ( ClosePrices ) );                                         # the x-axis  
  plot ( x( one_price_ix ), ClosePrices( one_price_ix ), ...
          [ "." char ( color( 4 ) ) ], "markersize", one_price_size );        # plot as a point/dot  
endif  
  
hold off;

## now add the x-axis tick labels, if the user has supplied the correct arguments

if ( nargin == 6 && isnumeric ( dates ) ) 
  error ("candle: If the sixth input argument, Dates, is a serial date number column (See datenum) or a datevec matrix (See datevec), Dateform input is required.\nThe chart has been plotted without x-axis dates.");
endif

if ( nargin == 6 && ischar ( dates ) )                                        # user has given a character vector of dates for dates
  
if ( size ( dates, 1 ) != size ( ClosePrices, 1 ) )
  error ("candle: The sixth input argument, Dates, and the OHLC prices vectors are different lengths.\nThe chart has been plotted without x-axis dates.");
else

ticks = cellstr ( dates ) ;
ax = "x" ;
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

endif
  
endif  
  
if ( nargin == 7 && isnumeric ( dates ) )                              # input arguments 6 and 7 assumed to be a dates vector and dateform respectively
  
if ( size ( dates, 1 ) != size ( ClosePrices, 1 ) )
  error ("candle: The sixth input argument, Dates, and the OHLC prices vectors are different lengths.\nThe chart has been plotted without x-axis dates.");
endif

if ( size ( dates, 2 ) == 1 )                                          # user has given a possible serial date number column for dates
 
is_monotonically_increasing = sum ( dates == cummax ( dates ) ) / size ( dates, 1 );

if ( is_monotonically_increasing != 1 )  
  error ("candle: Dates does not appear to be a serial date number column as it is not monotonically increasing.\nThe chart has been plotted without x-axis dates.");  
endif

if ( isnumeric ( dateform ) && ( dateform < 0 || dateform > 31 ) )     
  error ("candle: Dateform integer code number is out of bounds (See datestr).\nThe chart has been plotted without x-axis dates."); 
endif

if ( isnumeric ( dateform ) && rem ( dateform, 1 ) > 0 )
  error ("candle: Dateform code number should be an integer 0 - 31 (See datestr).\nThe chart has been plotted without x-axis dates.");
endif      

ticks = datestr ( dates, dateform );
ticks = mat2cell ( ticks, ones ( size ( ticks, 1 ), 1 ), size ( ticks, 2 ) );  
ax = "x";
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

elseif ( size ( dates, 2 ) == 6 )                                      # user has given a possible datevec matrix for dates

if ( isnumeric ( dateform ) && ( dateform < 0 || dateform > 31 ) )     
  error ("candle: Dateform integer code number is out of bounds (See datestr).\nThe chart has been plotted without x-axis dates."); 
endif

if ( isnumeric ( dateform ) && rem ( dateform, 1 ) > 0 )
  error ("candle: Dateform code number should be an integer 0 - 31 (See datestr).\nThe chart has been plotted without x-axis dates.");
endif 

ticks = datestr ( dates, dateform );
ticks = mat2cell ( ticks, ones ( size ( ticks, 1 ), 1 ), size ( ticks, 2 ) );
ax = "x";
xticks = 1 : length ( ClosePrices );
h = gca ();
set ( h, "xtick", xticks );
set ( h, [ ax "ticklabel" ], ticks );

else 
  error ("candle: The numerical Dates input is neither a single column serial date number nor a six column datevec format.\nThe chart has been plotted without x-axis dates."); 
  
endif

endif # end of ( nargin == 7 && isnumeric ( dates ) ) if statement 

endfunction

%!demo 1
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open );
%! title("default plot.");

%!demo 2
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk' );
%! title("default plot with user selected black background");

%!demo 3
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brkg' );
%! title("default color candlestick bodies and user selected background and wick colors");

%!demo 4
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'gmby' );
%! title("all four colors being user selected");

%!demo 5
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brwk', datenum_vec, "yyyy-mm-dd HH:MM" );
%! title("default plot with datenum dates and character dateform arguments");

%!demo 6
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk', datenum_vec, 31 );
%! title("default plot with user selected black background with datenum dates and integer dateform arguments");

%!demo 7
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! datenum_vec = [ 7.3702e+05; 7.3702e+05 ;7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05; ...
%! 7.3702e+05; 7.3702e+05; 7.3702e+05; 7.3702e+05 ];
%! datevec_vec = datevec( datenum_vec );
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brwk', datevec_vec, 21 );
%! title("default plot with datevec dates and integer dateform arguments");

%!demo 8
%! Open = [ 1292.4; 1291.7; 1291.8; 1292.2; 1291.5; 1291.0; 1291.0; 1291.5; 1291.7; 1291.5; 1290.7 ];
%! High = [ 1292.6; 1292.1; 1292.5; 1292.3; 1292.2; 1292.2; 1292.7; 1292.4; 1292.3; 1292.1; 1292.9 ];
%! Low = [ 1291.3; 1291.3; 1291.7; 1291.1; 1290.7; 1290.2; 1290.3; 1291.1; 1291.2; 1290.5; 1290.4 ];
%! Close = [ 1291.8; 1291.7; 1292.2; 1291.5; 1291.0; 1291.1; 1291.5; 1291.7; 1291.6; 1290.8; 1292.8 ];
%! character_dates = char ( [] );
%! for i = 1 : 11
%! character_dates = [ character_dates ; "a date" ] ;
%! endfor
%! graphics_toolkit('fltk'); candle( High, Low, Close, Open, 'brk', character_dates );
%! title("default plot with user selected black background with character dates argument");
I hope readers who are Octave users find this useful.

Monday, 20 November 2017

Candlestick Plotting Function for Octave.

I have long been frustrated by the lack of an "out of the box" solution for plotting OHLC candlestick charts natively in Octave, the closest solution I know being the highlow plot function from the financial package ( which does not yet implement a candle function ) over at Octave Sourceforge. This being the case, I decided to write my own candlestick plotting functions, the codes for which are shown below.

This first, basic version just plots a candlestick chart with blue for up bars ( close higher than open ) or red for down bars ( close less than open ).
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C})
## 
## Takes O, H, L and C inputs and plots a candlestick chart, with blue bars for
## close up days and red bars for close down days.
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-11-16

function [retval] = candles ( open , high , low , close )
  
  if ( nargin != 4 )
      error ( "Not enough input arguments. Should be OHLC vectors." ) ;
  endif
  
  wicks = high .- low ;
  body = close .- open ;
  up_down = sign( body ) ;
  
  hold on ;
  % plot the wicks
  x = ( 1 : length( close ) ) ; % the x-axis
  idx = x ;
  high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
  low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks
  
  % plot the up bars
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == 1 ) ; idx = find( idx ) ;                      % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
  low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "b" , 'linewidth' , 20 ) ;                           % plot blue up bars
  
  % plot the down bars
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == -1 ) ; idx = find( idx ) ;                     % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
  low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "r" , 'linewidth' , 20 ) ;                          % plot red down bars
  
  hold off ; 

endfunction
The second version is a conditional plotting version which takes a condition vector as input along with the OHLC vectors and plots candlesticks with different colours according to the condition in the condition vector ( conditions are integers 1 to 3 inclusive ).
## Copyright (C) 2017 dekalog
## 
## This program is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*- 
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C}, @var{TC})
## 
## Takes OHLC inputs and a TC vector and plots a candlestick chart, with up bars coloured
## cyan or blue and down bars magenta or red, dependent on value contained in the 
## condition vector TC ( values == 1 or == 2 ).
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2017-11-16

function [retval] = candles_tc ( open , high , low , close , tc )
  
  if ( nargin != 5 )
     error ( "Not enough input arguments. Should be OHLC and a condition vector" ) ;
  endif
  
  if ( length( unique( tc ) ) > 3 )
     error ( "Too many conditions in condition vector. Maximum of 3 conditions allowed." ) ;
  endif
  
  wicks = high .- low ;
  body = close .- open ;
  up_down = sign( body ) ;
  up_colours = "cbg" ;
  down_colours = "mrk" ;
  candle_body_width = 20 ;
  
  hold on ;
  
  % plot the wicks
  x = ( 1 : length( close ) ) ; % the x-axis
  idx = x ;
  high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
  low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks

  for ii = 1 : length( unique( tc ) )
  
  % plot the up bars for ii condition
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == 1 ) .* ( tc == ii ) ; idx = find( idx ) ;      % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
  low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , up_colours( ii ) , 'linewidth' , candle_body_width ) ;
  
  % plot the down bars for ii condition
  x = ( 1 : length( high ) ) ;                                      % the x-axis
  idx = ( up_down == -1 ) .* ( tc == ii ) ; idx = find( idx ) ;     % index by condition
  high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
  low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
  x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
  y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
  plot( x , y , down_colours( ii ) , 'linewidth' , candle_body_width ) ;

  endfor
  
  hold off ; 

endfunction
An example of a plot by this second version is
There are two conditions being plotted on this 1 hour chart: cyan up bars and magenta down bars are bars that occur in the "Asian session," i.e. after 17:00 New York Time local time and before 07:00 London local time; and blue up bars and red down bars are bars that occur in the overlapping London - New York session, i.e. between 07:00 London local time and 17:00 New York local time.

The horizontal black lines are not part of the basic plot function but are added later by use of the "hold" function. These lines are the "Tokyo Channel," i.e. the high and low of the Asian session extended into the immediately following London - New York session.

I hope readers who use Octave will find these plotting functions useful.

Tuesday, 31 October 2017

Prepending Historical Data with Oanda's R API

As a follow on to my previous post, which was about appending data, the script below prepends historical data to an assumed existing data record.
% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data/hourly")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {  

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   current_ohlc_record = read.table( file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , header = FALSE , na = "" , sep = "," ,
                                      stringsAsFactors = FALSE )
   
   current_ohlc_record_begin_date_time = as.character( current_ohlc_record[ 1 , 1 ] ) % get the date/time value to be matched
   last_date_ix = as.Date( current_ohlc_record[ 1 , 1 ] )                             % the end date for new data to be downloaded             
   
   % last 40 weeks of hourly data approx = 5000 hourly bars            
   begin_date_ix = as.Date( last_date_ix - 280 )                      % the begin date for new data to be downloaded

   % download the missing historical data from begin_date_ix to last_date_x.
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          begin_date_ix , last_date_ix + 2 ) % +2 to ensure that the end of the new downloaded data will
                                                             % overlap with the beginning of current_ohlc_record
  
   % having ensured no data is missed by overlaping with the current_ohlc_record, delete duplicated OHLC information
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value
   
   ix = charmatch( current_ohlc_record_begin_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( ix : nrow( new_historical_data ) ) , ]
   
   % before prepending new_historical_data in front of current_ohlc_record, need to give names to current_ohlc_record as
   % rbind needs to bind by named attributes
   names( current_ohlc_record ) = names( new_historical_data )
   
   % see https://stackoverflow.com/questions/11785710/rbind-function-changing-my-entries for reason for following
   % also need to coerce that dates in new_historical_data from POSIXct to character
   new_historical_data$TimeStamp = as.character( new_historical_data$TimeStamp )
   
   % and now prepend new_historical_data to current_ohlc_record
   combined_records = rbind( new_historical_data , current_ohlc_record , stringsAsFactors = FALSE )
   
   % and coerce character dates back to a POSIXct date format prior to printing
   combined_records$TimeStamp = as.POSIXct( combined_records$TimeStamp )
   
   % write combined_records to file
   write.table( combined_records , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , 
                col.names = FALSE , sep = "," )
   
   added_data_length = nrow( new_historical_data ) % length of added new data

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
   
   % write updated Instrument_update_file to file
   write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE )

} % end of for all_current_historical_data_list loop
As described in the previous post the function HisPricesDates is called to do the actual downloading, with the relevant dates for function input being read and calculated from the existing data file ( I have hard coded for hourly data but this can, of course, be changed or implemented as user input in the R session). As usual I have commented the script to explain what is going on.

However, one important caveat is that it is assumed that there is actually some Oanda data to download and prepend prior the the earliest date in the existing  record, and there are no checks of this assumption. Therefore, the script might fail in unexpected ways if one attempts to reach too far back in history for the prependable data.

Friday, 27 October 2017

Updating Historical Data Using Oanda's API and R

Following on from my previous post about downloading historical data, this post shows how previously downloaded data may be updated and appended with new, more recent data without having to re-download all the old data all over again.

The main function to do this, HisPricesDates, downloads data between given dates as function inputs and is shown below.
HisPricesDates  = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Start, End ){

% a typical Oanda API call might look like  
% https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD&granularity=D&start=2014-03-21&end=2014-04-21&candleFormat=midpoint&includeFirst=false
% which is slowly built up by using the R paste function, commented at end of each line below
  
  httpaccount  = "https://api-fxtrade.oanda.com"
  auth           = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec  = paste(httpaccount,"/v1/candles?instrument=",sep="") % https://api-fxtrade.oanda.com/v1/candles?instrument=
  QueryHistPrec1 = paste(QueryHistPrec,Instrument,sep="")              % https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD
  qstart = paste("start=",Start,sep="")                                % start=2014-03-21
  qend   = paste("end=",End,sep="")                                    % end=2014-04-21   
  qcandleFormat  = "candleFormat=midpoint"                             % candleFormat=midpoint
  qgranularity   = paste("granularity=",Granularity,sep="")            % granularity=D
  qdailyalignment    = paste("dailyAlignment=",DayAlign,sep="")        % dailyAlignment=0
  qincludeFirst = "includeFirst=false"                                 % includeFirst=false
  QueryHistPrec2 = paste(QueryHistPrec1,qgranularity,qstart,qend,qcandleFormat,qincludeFirst,qdailyalignment,sep="&")
  InstHistP = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices        = data.frame(InstHistPjson[[3]])
  Prices$time   = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}
This function is called by two R scripts, one for downloading daily data and one for intraday data.

The daily update script, which is shown next,
% cd to the daily data directory
setwd("~/Documents/octave/oanda_data/daily")

all_current_historical_data_list = read.table("instrument_daily_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {
  
  instrument = all_current_historical_data_list[ ii , 1 ]
  % read second column of dates in all_current_historical_data_list as a date index
  date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
  todays_date = as.Date( Sys.time() )
  
  % download the missing historical data from date_ix to todays_date, if and only if, date_ix != todays_date
  if( date_ix + 1 != todays_date ) {
  
  new_historical_data = HisPricesDates( Granularity = "D", DayAlign, TimeAlign, AccountToken, instrument,
                         date_ix , todays_date )

  % the new_historical_data might only try to add incomplete OHLC data, in which case do not actually
  % want to update, so only update if we will be adding new, complete OHLC information  
  if ( nrow( new_historical_data ) >= 2 & new_historical_data[ 2 , 7 ] == TRUE ) {
  
    % now do some data manipulation
    % expect date of last line in Instrument_update_file == date of first line in new_historical_data 
    if ( date_ix == as.Date( new_historical_data[ 1 , 1 ] ) ) { % this is the case if true
       new_historical_data = new_historical_data[ -1 , ]       % so delete first row of new_historical_data
    }
  
    % similarly, expect last line of new_historical_data to be an incomplete OHLC bar
    if ( new_historical_data[ nrow( new_historical_data) , 7 ] == FALSE) {         % if so,
       new_historical_data = new_historical_data[ -nrow( new_historical_data) , ] % delete this last line
    }
    
    % append new_historical_data to the relevant raw data file
    write.table( new_historical_data , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )
  
    added_data_length = nrow( new_historical_data )
    new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )
    
    % and amend Instrument_update file with lastest update information  
    all_current_historical_data_list[ ii , 2 ] = new_last_date
    all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
  
    } % end of download if statement
  
  } % end of ( date_ix != todays_date ) if statement
  
} % end of for all_current_historical_data_list loop

% Write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_daily_update_file" , row.names = FALSE , col.names = FALSE , na = "" )
has if statements as control structures to check that there is likely to be new daily data to actually download. It does this by checking a last_update date contained in an "instrument_daily_update_file" and comparing this with the current OS system time. If there is likely to be new data, the script runs and then updates this "instrument_daily_update_file." If not, the script exits with nothing having been done.

The intraday update script doe not have the checks the daily script has because I assume there will always be some new intraday data available for download. In this case, the last_update date is read from the "instrument_update_file" purely to act as an input to the above HisPricesDates function. As a result, this script involves some data manipulation to ensure that duplicate data is not printed to file. This script is shown next and is heavily commented to explain what is happening.
% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   % read second column of dates in all_current_historical_data_list as a date index
   date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
   
   todays_date = as.Date( Sys.time() )

   % download the missing historical data from date_ix to todays_date. If date_ix == todays_date, will download all
   % hourly bars for today only. 
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          date_ix , todays_date + 1 )

   % the new_historical_data will almost certainly have incomplete hourly OHLC data in its last line,
   % so delete this incomplete OHLC information
   if ( new_historical_data[ nrow( new_historical_data ) , 7 ] == FALSE ) {
        new_historical_data = new_historical_data[ -nrow( new_historical_data ) , ]
   }
   
   % read the last line only of the current OHLC file for this instrument
   file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) % get the filename
   
   system_command = paste( "tail -1" , file , sep = " " )     % create a unix system command to read the last line of this file
   
   % read the file's last line
   old_historical_data = read.csv( textConnection( system( system_command , intern = TRUE ) ) , header = FALSE , sep = "," ,
                                    stringsAsFactors = FALSE )
   
   old_historical_data_end_date_time = old_historical_data[ 1 , 1 ]            % get the date value to be matched 
   
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value

   ix = charmatch( old_historical_data_end_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( 1 : ix ) , ]
   
   % append new_historical_data to the relevant raw data file
   write.table( new_historical_data , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )

   added_data_length = nrow( new_historical_data )                          % length of added new data
   new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )  % date of last update

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 2 ] = new_last_date
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length

} % end of for all_current_historical_data_list loop

% finally, write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE , na = "" )
There is one important thing to point out on lines 29 to 33, which is that this section of code relies on a Unix based command, which in turn means that this almost certainly will not work on Windows based OSes. Windows users will have to find their own hack to load just the last line of the relevant file, or put up with loading the whole historical data file and indexing just the last line.

Thursday, 21 September 2017

Downloading Historical Data Using Oanda's API and R

It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data.

Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position.

My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio.

The first such R script is called to initialise the variables and load the various functions into the R environment, shown below.
# Required Packages in order to use the R API functions
library("downloader")
library("RCurl")
library("jsonlite")
library("httr")

# -- ---------------------------------------------------------------------------------- #
# -- Specific Values of Parameters in API for my primary account ---------------------- #
# -- ---------------------------------------------------------------------------------- #

# -- numeric ------ # My Account ID 
AccountID = 123456     
# -- character ---- # My Oanda Token
AccountToken = "blah-de-blah-de-blah"

# load the various function files
source("~/path/to/InstrumentsList.R")
source("~/path/to/R_Oanda_API_functions/ActualPrice.R")
source("~/path/to/HisPricesCount.R")
source("~/path/to/HisPricesDates.R")

# get list of all tradeable instruments
trade_instruments = InstrumentsList( AccountToken , AccountID )
View( trade_instruments )

# load some default values
# -- character ---- # Granularity of the prices
# granularity: The time range represented by each candlestick. The value specified will determine the alignment 
# of the first candlestick.
# choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month )
Granularity = "D"

# -- numeric ------ # Hour of the "End of the Day"
# dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. 
# The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be 
# an integer between 0 and 23.
DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York"

# -- character ---- # Time Zone in format "Continent/Zone
# alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the 
# returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by 
# the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at 
# http://developer.oanda.com/docs/timezones.txt
# "America%2FMexico_City" was the originallly provided, but could use, for example,  "Europe%2FLondon" or "Europe%2FWarsaw"
TimeAlign = "Europe%2FLondon"

################################# IMPORTANT NOTE #####################################################################
# By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures
# that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is 
# Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer
# daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT.
######################################################################################################################

Start = Sys.time() # Current system time
End = Sys.time() # Current system time
Count = 500 # Oanda default

# now cd to the working directory
setwd("~/path/to/oanda_data")
The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next.
InstrumentsList = function( AccountToken , AccountID )
{
  httpaccount = "https://api-fxtrade.oanda.com"
  auth        = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  Queryhttp   = paste(httpaccount,"/v1/instruments?accountId=",sep="")
  QueryInst   = paste(Queryhttp,AccountID,sep="")
  QueryInst1  = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstJson    = fromJSON(QueryInst1, simplifyDataFrame = TRUE)
  FinalData   = data.frame(InstJson)
  colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits")
  FinalData$MaxTradeUnits = as.numeric(FinalData$MaxTradeUnits)
  return(FinalData)
}
This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also.
# cd to the working directory
setwd("~/path/to/oanda_data")

# dataframe to keep track of updates
Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE )

for( ii in 1 : nrow( trade_instruments ) ) {
  
  instrument = trade_instruments[ ii , 1 ]
  
  # write details of instrument to Instrument_update_file
  Instrument_update_file[ ii , 1 ] = instrument
  
  historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 )
  last_row_ix = nrow( historical_prices )
  
  if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete
    # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data
    # to Instrument_update_file
    Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] )
    
    # and delete the row with these incomplete values
    historical_prices = historical_prices[ 1 : last_row_ix - 1 , ]
    
  } # end of if statement
  
  # Write historical_prices to file
  write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , 
             col.names = FALSE , sep = "," )

  } # end of for loop

  # Write Instrument_update_file to file
  write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," )
This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is
HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){
  
  httpaccount      = "https://api-fxtrade.oanda.com"
  auth             = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec    = paste(httpaccount,"/v1/candles?instrument=",sep="")
  QueryHistPrec1   = paste(QueryHistPrec,Instrument,sep="")
  qcount           = paste("count=",Count,sep="")
  qcandleFormat    = "candleFormat=midpoint"
  qgranularity     = paste("granularity=",Granularity,sep="")
  qdailyalignment  = paste("dailyAlignment=",DayAlign,sep="")
  QueryHistPrec2   = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&")
  InstHistP        = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson    = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices           = data.frame(InstHistPjson[[3]])
  Prices$time      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}
One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.