Dekalog Blog: R Code

Showing posts with label R Code. Show all posts

Friday, 26 March 2021

Market/Volume Profile and Matrix Profile

A quick preview of what I am currently working on: using Matrix Profile to search for time series motifs, using the R tsmp package. The exact motifs I'm looking for are the various "initial balance" set ups of Market Profile charts.

To do so, I'm concentrating the investigation around both the London and New York opening times, with a custom annotation vector (av). Below is a simple R function to set up this custom av, which is produced separately in Octave and then loaded into R.

mp_adjusted_by_custom_av <- function( mp_object , custom_av ){
  ## https://stackoverflow.com/questions/66726578/custom-annotation-vector-with-tsmp-r-package
  mp_object$av <- custom_av
  class( mp_object ) <- tsmp:::update_class( class( mp_object ) , "AnnotationVector" )
  mp_adjusted_by_custom_av <- tsmp::av_apply( mp_object )
  return( mp_adjusted_by_custom_av )
}

This animated GIF shows plots of short, exemplar adjusted market profile objects highlighting the London only, New York only and combined results of the relevant annotation vectors.

This is currently a work in progress and so I shall report results in due course.

Saturday, 30 January 2021

Temporal Clustering Times on Forex Majors Pairs

In the following code box there are the results from the temporal clustering routine of my last few posts on the four forex majors pairs of EUR_USD, GBP_USD, USD_CHF and USD_JPY.

###### EUR_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
## Delta turning point filter only ##
## "Normal" turning point filter only

###################### Monday ##############################################
K_opt == 8, ix values == 13  38    63    89     112    135    162    186    ## averaged over all 15 n_bars 1 to 15 inclusive
                         00  4:10  8:20  12:40  16:30  20:20  00:50  4:50
                         
K_opt == 8, ix values == 13 39 64 89 112 135 161 186                        ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 5, ix_values == 21 60 97 134 175                                   ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K == 6,     ix values == 21 59 94 125 158 184

K_opt == 11, ix values == 9 26 43 60 78 95 113 132 151 169 185              ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 8, ix values == 13  36  61  86 111 136 161 186                     ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 8, ix values == 13  34  61  87 110 137 164 187                     ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 8, ix values == 13  38  63  88 112 137 162 186                     ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 10, ix values == 10  31  52  72  91 112 131 150 169 188            ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 8, ix values == 12  35  62  88 112 137 164 187                     ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt == 6, ix values == 131   169    206   244    283    322               ## averaged over all 15 n_bars 1 to 15 inclusive
                         19:40 02:00  8:10  14:30  21:00  03:30
                         
K_opt == 6, ix values == 131 170 207 245 284 323                            ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 131 168 206 243 274 305 330                        ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 11, ix values == 124 143 164 184 205 226 247 268 289 310 331       ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 124 144 164 185 204 225 246 267 288 309 332       ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt == 7, ix values = 133 169 206 241 273 304 329                         ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 127 152 175 202 228 253 278 305 330                ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 9, ix values == 127 152 177 202 228 253 278 304 329                ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 132 168 205 242 273 304 329                        ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt == 6, ix values == 275    312    351    389    426    465             ## averaged over all 15 n_bars 1 to 15 inclusive
                         19:40  01:50  08:20  14:40  20:50  03:20
                         
K_opt == 6, ix values == 275 313 352 391 428 466                            ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 274 312 350 389 424 463                            ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 272 299 322 347 372 397 422 449 474                ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 268 288 308 329 348 369 390 411 432 453 476       ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 275 312 351 388 424 463                            ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 272 297 322 348 373 398 423 449 474                ## averaged over all 15 n_bars 1 to 15 inclusive 

K_opt == 9, ix values == 271 297 322 348 373 398 423 448 473                ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 276 311 350 389 426 465                            ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

####################### Thursday ###########################################
K_opt == 6, ix values == 420    457    495    532    570    609             ## averaged over all 15 n_bars 1 to 15 inclusive
                         19:50  02:00  08:20  14:30  20:50  03:20
                         
K_opt == 6, ix values == 420 457 494 531 570 610                            ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 420 457 495 532 568 607                            ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 416 443 466 492 518 543 568 593 618                ## averaged over all 15 n_bars 1 to 15 inclusive 

K_opt == 10, ix values == 414 437 460 483 506 527 550 573 596 619           ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 416 443 466 493 520 543 568 595 618                ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 415 440 465 492 518 543 568 593 618                ## averaged over all 15 n_bars 1 to 15 inclusive 

K_opt == 9, ix values ==  415 440 465 492 518 543 568 593 618               ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 420 457 494 529 561 592 617                        ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

####################### Friday #############################################
K_opt == 5, ix values == 564    599    635    670     703                   ## averaged over all 15 n_bars 1 to 15 inclusive
                         19:50  01:40  07:40  13:30   19:00
                         
K_opt == 6, ix values == 563 596 627 654 680 707                            ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 
K == 5,     ix values == 564 599 635 668 703

K_opt == 5, ix values == 564 601 639 674 705                                ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 556 575 595 614 633 652 672 691 711                ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 554 570 587 602 619 634 651 667 682 698 713       ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 556 575 595 614 633 652 671 691 711                ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 556 575 596 613 634 652 672 691 711                ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 556 575 594 613 633 652 672 691 710                ## averaged over all 15 n_bars 1 to 15 inclusive 

K_opt == 9, ix values == 556 575 594 613 634 653 672 691 710                ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 5, ix values == 564 600 637 674 705                                ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

###### GBP_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 8, ix_values = 13    36    61    86     111    136    162    186    ## averaged over all 15 n_bars 1 to 15 inclusive
                       0:00  3:50  8:00  12:10  16:20  20:30  0:50   4:50

K_opt = 9, ix_values = 12  34  56  78  99 120 141 164 187                   ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 8, ix_values = 12  35  61  86 110 136 163 186                       ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 12, ix_values = 124    143    162   180   199   216   235   254   274   293   312   332     ## averaged over all 15 n_bars 1 to 15 inclusive
                        18:30  21:40  0:50  3:50  7:00  9:50  13:00 16:10 19:30 22:40 1:50  5:10

K_opt = 11, ix_values = 124 143 164 185 206 227 248 269 290 311 332         ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )  

K_opt = 9, ix_values = 128 154 177 205 230 254 279 307 330                  ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 11, ix_values = 269   290   311  331  352  373   394   415   434   455   476   ## averaged over all 15 n_bars 1 to 15 inclusive
                        18:40 22:10 1:40 5:00 8:30 12:00 15:30 19:00 22:10 1:40  5:10

K_opt = 11, ix_values = 269 289 310 330 351 372 393 413 434 455 476         ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 8, ix_values = 275 310 341 367 394 422 451 475                      ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 9, ix_values = 415   440   465  492  517   542   568   594   618    ## averaged over all 15 n_bars 1 to 15 inclusive
                       19:00 23:10 3:20 7:50 12:00 16:10 20:30 0:50  4:50

K_opt = 9, ix_values = 415 440 465 491 517 542 568 593 618                  ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )  

K_opt = 9, ix_values = 416 441 464 492 519 542 569 596 619                  ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 9, ix_values = 557   576   595  614  633  652   671   690   711     ## averaged over all 15 n_bars 1 to 15 inclusive
                       18:40 21:50 1:00 4:10 7:20 10:30 13:40 16:50 20:20

K_opt = 9, ix_values = 557 576 595 614 633 652 671 691 711                  ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 8, ix_values = 557 576 599 621 642 665 686 709                      ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

###### USD_CHF 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 11, ix_values = 8      25    42   61   79    96    113   131   150   169  188   ## averaged over all 15 n_bars 1 to 15 inclusive
                        23:10  2:00  4:50 8:00 11:00 13:50 16:40 19:40 22:50 2:00 5:10

K_opt = 11, ix_values =  9  26  43  60  79  96 114 133 151 170 189          ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 7, ix_values =  13  38  66  99 127 157 184                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 9, ix_values = 127   152   177  202  228   253   279   306  330     ## averaged over all 15 n_bars 1 to 15 inclusive
                       19:00 23:10 3:20 7:30 11:50 16:00 20:20 0:50 4:50

K_opt = 11, ix_values = 124 144 165 185 204 225 246 267 288 309 331         ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )  

K_opt = 7, ix_values = 133 170 205 240 270 301 328                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 10, ix_values = 270   293   316  342  365   388   411   432   454  475  ## averaged over all 15 n_bars 1 to 15 inclusive
                        18:50 22:40 2:30 6:50 10:40 14:30 18:20 21:50 1:30 5:00

K_opt = 12, ix_values = 268 287 308 327 346 365 384 401 420 439 458 477     ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )  

K_opt = 7, ix_values = 276 313 349 383 414 444 471                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 11, ix_values = 413   432   452  471  491  512   533   554   575   598  619  ## averaged over all 15 n_bars 1 to 15 inclusive
                        18:40 21:50 1:10 4:20 7:40 11:10 14:40 18:10 21:40 1:30 5:00

K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 563 582 601 621     ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 9, ix_values = 415 440 463 491 518 543 570 597 619                  ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 9, ix_values = 557   576   596  615  634  653   672   691   710     ## averaged over all 15 n_bars 1 to 15 inclusive
                       18:40 21:50 1:10 4:20 7:30 10:40 13:50 17:00 20:10

K_opt = 9, ix_values = 556 575 595 614 633 652 671 690 710                  ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour ) 

K_opt = 7, ix_values = 558 579 602 629 652 677 705                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )                

############################################################################

###### USD_JPY 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 12, ix_values = 8     24   41   58   73    90    107   124   141   158  173  190  ## averaged over all 15 n_bars 1 to 15 inclusive
                        23:10 1:50 4:40 7:30 10:00 12:50 15:40 18:30 21:20 0:10 2:40 5:30

K_opt = 12, ix_values = 8  24  41  56  73  90 107 124 141 158 173 190       ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 5, ix_values = 20  60  99 136 175                                   ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 9, ix_values = 128   154   179  204  229   254   279   306  331     ## averaged over all 15 n_bars 1 to 15 inclusive
                       19:10 23:30 3:40 7:50 12:00 16:10 20:20 0:50 5:00

K_opt = 9, ix_values = 128 153 178 203 228 254 279 305 330                  ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 133 168 205 240 271 302 329                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 11, ix_values = 269   289   310  331  352  373   394   414   433   454  476  ## averaged over all 15 n_bars 1 to 15 inclusive
                        18:40 22:00 1:30 5:00 8:30 12:00 15:30 18:50 22:00 1:30 5:10

K_opt = 9, ix_values = 272 297 322 348 374 399 424 449 474                  ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 10, ix_values = 269 288 309 331 352 376 398 423 450 475             ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 9, ix_values = 416   442   467  492  518   543   568   593  618     ## averaged over all 15 n_bars 1 to 15 inclusive
                       19:10 23:30 3:40 7:50 12:10 16:20 20:30 0:40 4:50

K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 564 583 602 621     ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 420 455 492 527 560 591 618                          ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 7, 8 or 9
ix_values 7 = 561   588   613       638        663   686    709             ## averaged over all 15 n_bars 1 to 15 inclusive
ix_values 8 = 557   578   599  622  643        666   687    710
ix_values 9 = 557   576   596  616  635  653   672   691    711             ## timings are for this bottom row
              18:40 21:50 1:10 4:30 7:40 10:40 13:50 17:00  20:20

K_opt = 8, ix_values = 558 579 600 621 644 665 687 709                      ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 6, ix_values = 563 594 621 646 676 705                              ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

This is based on 10 minute bars over the last year or so. Readers should read my last few previous posts for background.

The first set of results, EUR_USD, are what the charts of my previous posts were based on and include combined results of my "Delta Turning Point Filter" and "Normal Turning Point Filter" and the results for each filter separately. Since there doesn't appear to be significant differences between these, the other three pairs' results are the combined filter results only.

The K_opt variable is the optimal number of clusters (see my temporal-clustering-part-3 post for how "optimal" is decided) and the ix_values are also described in this post. For convenience the first set of ix_values per day have the relevant times anotated underneath and therefore it is a simple matter to count forwards/backwards in 10 minute increments to place times to the other ix_values. The variable n_bars is an input to the turning point filter functions and essentially indicates the lookback/lookforward period (n_bar == 2 would mean 2 x 10 minute periods) used for determining a local high/low according to each function's logic.

As to how to interpret this, a typical sequence of times per day might look like this:

18:40 22:00 1:30 5:00 8:30 12:00 15:30 18:50 22:00 1:30 5:10

where the highlighted times represent the BST times for the period covering the London session open to the New York session close for one day. The preceding and following times are the two "book-ending" Asian sessions.

Close inspection of these results reveals some surprising regularities. In even just the above single example (an actual copy and paste of a code box example) there appear to be definite times per day at which a local high/low occurs. I hopefully will be able to incorporate this into some type of chart for a nice visual presentation of the data.

Saturday, 14 November 2020

Temporal Clustering, Part 3

Continuing on with the subject matter of my last post, in the code box below there is R code which is a straight forward refactoring of the Octave code contained in the second code box of my last post. This code is my implementation of the cross validation routine described in the paper Cluster Validation by Prediction Strength, but adapted for use in the one dimensional case. I have refactored this into R code so that I can use the Ckmeans.1d.dp package for optimal, one dimensional clustering.

library( Ckmeans.1d.dp )

## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to//all_data_matrix" , header = FALSE )

## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; adjust = 1 ; sum_seq_l = as.numeric( length( sum_seq ) )## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; adjust = 2 ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday

## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )

upper_tri_mask = 1 * upper.tri( matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l ) , diag = FALSE )
no_sample_iters = 1000
max_K = 20
all_k_ps = matrix( 0L , nrow = 1 , ncol = max_K )

for ( iters in 1 : no_sample_iters ) {

## sample the data in data by rows
train_ix = sample( nrow( data ) , size = round( nrow( data ) / 2 ) , replace = FALSE )
train_data = data[ train_ix , sum_seq ] ## extract training data using train_ix rows of data
train_data_sum = colSums( train_data )  ## sum down the columns of train_data
test_data = data[ -train_ix , sum_seq ] ## extract test data using NOT train_ix rows of data
test_data_sum = colSums( test_data )    ## sum down the columns of test_data
## adjust for weekend if necessary
if ( adjust == 1 ) { ## Monday, so correct artifacts of weekend gap
  train_data_sum[ 1 : 5 ] = mean( train_data_sum[ 1 : 48 ] )
  test_data_sum[ 1 : 5 ] = mean( test_data_sum[ 1 : 48 ] )   
} else if ( adjust == 2 ) { ## Friday, so correct artifacts of weekend gap
  train_data_sum[ ( sum_seq_l - 4 ) : sum_seq_l ] = mean( train_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] )
  test_data_sum[  ( sum_seq_l - 4 ) : sum_seq_l ] = mean( test_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] ) 
}

for ( k in 1 : max_K ) {
  
## K segment train_data_sum
train_res = Ckmeans.1d.dp( sum_seq , k , train_data_sum )
train_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )

## K segment test_data_sum
test_res = Ckmeans.1d.dp( sum_seq , k , test_data_sum )
test_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )

  for ( ii in 1 : length( train_res$centers ) ) {
    ix = which( train_res$cluster == ii )
    train_out_pairs_mat[ ix , ix ] = 1 
    ix = which( test_res$cluster == ii )
    test_out_pairs_mat[ ix , ix ] = 1
    }
  ## coerce to upper triangular matrix
  train_out_pairs_mat = train_out_pairs_mat * upper_tri_mask
  test_out_pairs_mat = test_out_pairs_mat * upper_tri_mask
  
  ## get minimum co-membership cluster proportion
  sample_min_vec = matrix( 0L , nrow = 1 , ncol = length( test_res$centers ) )
  for ( ii in 1 : length( test_res$centers ) ) {
    ix = which( test_res$cluster == ii )
    test_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] )
    train_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] * train_out_pairs_mat[ ix , ix ] )
    sample_min_vec[ , ii ] = train_cluster_sum / test_cluster_sum
  }
  
  ## get min of sample_min_vec
  min_val = min( sample_min_vec[ !is.nan( sample_min_vec ) ] ) ## removing any NaN
  all_k_ps[ , k ] = all_k_ps[ , k ] + min_val

} ## end of K for loop

} ## end of sample loop

all_k_ps = all_k_ps / no_sample_iters ## average values
plot( 1 : length( all_k_ps ) , all_k_ps , "b" , xlab = "Number of Clusters K" , ylab = "Prediction Strength Value" )
abline( h = 0.8 , col = "red" )

The purpose of the cross validation routine is to select the number of clusters K, in the model selection sense, that is best supported by the available data. The above linked paper suggests that the optimal number of clusters K is the highest number K that has a prediction strength value over some given threshold (e.g. 0.8 or 0.9). The last part of the code plots the values of prediction strength for K (x-axis) vs. prediction strength (y-axis), along with the threshold value of 0.8 in red. For the particular set of data in question, it can be seen that the optimal K value for the number of clusters is 8.

This second code box shows code, re-using some of the above code, to visualise the clusters for a given K,

library( Ckmeans.1d.dp )

## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to/all_data_matrix" , header = FALSE )
data_sum = colSums( data ) ## sum down the columns of data
data_sum[ 1 : 5 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap
data_sum[ 716 : 720 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap

## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday

## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )

k = 8
res = Ckmeans.1d.dp( sum_seq , k , data_sum[ sum_seq ] )

plot( sum_seq , data_sum[ sum_seq ], main = "Cluster centres. Cluster centre ix is a predicted turning point",
     col = res$cluster,
     pch = res$cluster, type = "h", xlab = "Count from beginning ix at ix = 1",
     ylab = "Total Counts per ix" )

abline( v = res$centers, col = "chocolate" , lty = "dashed" )

text( res$centers, max(data_sum[sum_seq]) * 0.95, cex = 0.75, font = 2,
      paste( round(res$centers) ) )

a typical plot for which is shown below.

The above plot can be thought of as a clustering at a particular scale, and one can go down in scale by selecting smaller ranges of the data. For example, taking all the datum clustered in the 3 clusters centred at x-axis ix values 38, 63 and 89 and re-running the code in the first code box on just this data gives this prediction strength plot, which suggests a K value of 6.

Re-running the code in the second code box plots these 6 clusters thus.

Looking at this last plot, it can be seen that there is a cluster at x-axis ix value 58, which corresponds to 7.30 a.m. London time, and within this green cluster there are 2 distinct peaks which correspond to 7.00 a.m. and 8.00 a.m. A similar, visual analysis of the far right cluster, centre ix = 94, shows a peak at the time of the New York open.

My hypothesis is that by clustering in the above manner it will be possible to identify distinct, intraday times at which the probability of a market turn is greater than at other times. More in due course.

Thursday, 26 March 2020

Some Basic Code Housekeeping

Since my last post, back in late November last year, I have been doing a few disparate things such as:

improving the coding of some functions in R to use the Oanda API to automatically download data using cronjobs
coding some Octave functions to plot/visualise the above data
more work on Random Vector Functional Link networks
trying my hand at some discretionary day trading to take advantage of the increased market volitility due to the current Covid-19 pandemic

The R functions I blogged about back in 2017 here and my recent work has been to improve them by implementing some basic price sanity checks on the OHLC bars, checking the integrity of the date handling prior to writing to file and also writing to file in a format more amenable to being read by Octave.

A new addition to this work is to also download Oanda's forex order and position books using the API, which I was inspired to do by this online guide. At the moment I am testing this data using ideas from the literature on Limit Order Books and using the R Boruta package to test if the features described in the literature are suitable for this data.

To help visualise this, I have written this Octave function

clear all ;
cd /home/dekalog/Documents/octave/oanda_data/20m ;

files_10m = glob( '*_ohlc_10m' ) ;
## files_10m =
## {
##   [1,1] = aud_jpy_ohlc_10m
##   [2,1] = aud_usd_ohlc_10m
##   [3,1] = eur_aud_ohlc_10m
##   [4,1] = eur_chf_ohlc_10m
##   [5,1] = eur_gbp_ohlc_10m
##   [6,1] = eur_jpy_ohlc_10m
##   [7,1] = eur_usd_ohlc_10m
##   [8,1] = gbp_chf_ohlc_10m
##   [9,1] = gbp_jpy_ohlc_10m
##   [10,1] = gbp_usd_ohlc_10m
##   [11,1] = nzd_usd_ohlc_10m
##   [12,1] = usd_cad_ohlc_10m
##   [13,1] = usd_chf_ohlc_10m
##   [14,1] = usd_jpy_ohlc_10m
##   [15,1] = xag_usd_ohlc_10m
##   [16,1] = xau_usd_ohlc_10m
## }
file_no = input( 'Enter file no. from list, a number 1 to 16 inclusive. ' ) ;
filename = files_10m{ file_no } ;
str_split = strsplit( filename , "_" ) ;
price = strjoin( str_split( 1 : 2 ) , "_" ) ;
orders = dlmread( [ price , '_orderbook_snapshot' ] ) ; ## get latest orderbook levels
latest_price = orders( 1 , 1 ) ;
[ ~ , ix ] = min( abs( orders( 1 , 1 ) .- orders( 2 : end , 1 ) ) ) ; ix = ix + 1 ;
order_levels = 10 ;
order_max = max( max( orders( ix - order_levels : ix + order_levels , 2 : 3 ) ) ) ;
max10 = 20 / order_max ; max20 = 10 / order_max ;

## create a unix system command to read the last 'n' rows of ohlc of filename
unix_command = [ "tail -100" , " " , filename ] ; 
[ ~ , data ] = system( unix_command ) ;
data = strsplit( data , { "," , "\n" } ) ; ## gives a cell arrayfun
if ( isempty( data{ end } ) )
 data( end ) = [] ;
endif
data = reshape( str2double( data' ) , 22 , 100 )' ;
## data cols 
## open == 18, high == 19, low == 20, close == 21, vol == 22

## plot the 10 minute bars
h = findobj( 'type' , 'figure' ) ;
if ( !isempty( h ) && any( h == 10 ) )
clf( 10 ) ;
endif
figure( 10 ) ; candle( data(:,19) , data(:,20) , data(:,21) , data(:,18) ) ;
hold on ; figure( 10 ) ; barh( orders( ix - order_levels : ix + order_levels , 1 ) , orders( ix - order_levels : ix + order_levels , 2 ) .* max10 , 0.7 , 'c' ) ; 
barh( orders( ix - order_levels : ix + order_levels , 1 ) , orders( ix - order_levels : ix + order_levels , 3 ) .* max10 , 0.4 , 'm' ) ;
hold off ;
hline( latest_price , 'r-' ) ; ## the latest snapshot price
for ii = ix - order_levels : ix + order_levels
hline( orders( ii , 1 ) , 'k:' , num2str( orders( ii , 1 ) ) ) ;
endfor
title( strjoin( str_split( 1 : 2 ) , " " ) ) ;

## get ix for forming 20 minute bars
ix0 = find( ( data(:,4) == shift( data(:,4) , -1 ) ) .* ( ( data(:,5) == 0 ) .* ( shift( data(:,5) , -1 ) == 10 ) ) ) ;
ix20 = find( ( data(:,4) == shift( data(:,4) , -1 ) ) .* ( ( data(:,5) == 20 ) .* ( shift( data(:,5) , -1 ) == 30 ) ) ) ;
ix40 = find( ( data(:,4) == shift( data(:,4) , -1 ) ) .* ( ( data(:,5) == 40 ) .* ( shift( data(:,5) , -1 ) == 50 ) ) ) ;

## bars beginning on the hour
## the highs
data( ix0 , 19 ) = max( [ data( ix0 , 19 ) , data( ix0.+1 , 19 ) ] , [] , 2 ) ;
## the lows
data( ix0 , 20 ) = min( [ data( ix0 , 20 ) , data( ix0.+1 , 20 ) ] , [] , 2 ) ;
## the close
data( ix0 , 21 ) = data( ix0.+1 , 21 ) ;
## the vol
data( ix0 , 22 ) = data( ix0 , 22 ) + data( ix0.+1 , 22 ) ;

## bars beginning 20 past the hour
## the highs
data( ix20 , 19 ) = max( [ data( ix20 , 19 ) , data( ix20.+1 , 19 ) ] , [] , 2 ) ;
## the lows
data( ix20 , 20 ) = min( [ data( ix20 , 20 ) , data( ix20.+1 , 20 ) ] , [] , 2 ) ;
## the close
data( ix20 , 21 ) = data( ix20.+1 , 21 ) ;
## the vol
data( ix20 , 22 ) = data( ix20 , 22 ) + data( ix20.+1 , 22 ) ;

## bars beginning 40 past the hour
## the highs
data( ix40 , 19 ) = max( [ data( ix40 , 19 ) , data( ix40.+1 , 19 ) ] , [] , 2 ) ;
## the lows
data( ix40 , 20 ) = min( [ data( ix40 , 20 ) , data( ix40.+1 , 20 ) ] , [] , 2 ) ;
## the close
data( ix40 , 21 ) = data( ix40.+1 , 21 ) ;
## the vol
data( ix40 , 22 ) = data( ix40 , 22 ) + data( ix40.+1 , 22 ) ;

## delete rows of 10, 30 and 50 minutes past the hour
data( [ ix0.+1 ; ix20.+1 ; ix40.+1 ] , : ) = [] ;

## plot 20 minute candles
if ( !isempty( h ) && any( h == 20 ) )
clf( 20 ) ;
endif
figure( 20 ) ; candle( data(:,19) , data(:,20) , data(:,21) , data(:,18) ) ;
hold on ; figure( 20 ) ; barh( orders( ix - order_levels : ix + order_levels , 1 ) , orders( ix - order_levels : ix + order_levels , 2 ) .* max20 , 0.7 , 'c' ) ; 
barh( orders( ix - order_levels : ix + order_levels , 1 ) , orders( ix - order_levels : ix + order_levels , 3 ) .* max20 , 0.4 , 'm' ) ;
hold off ;
hline( latest_price , 'r-' ) ; ## the latest snapshot price
for ii = ix - order_levels : ix + order_levels
hline( orders( ii , 1 ) , 'k:' , num2str( orders( ii , 1 ) ) ) ;
endfor
title( strjoin( str_split( 1 : 2 ) , " " ) ) ;

which takes in 10 minute OHLC data and creates 20 minute bars from these 10 minute bars and plots both with the 10 order book levels above and below the latest order book price level. The horizontal blue bars are percentage of buy orders whilst the red bars are sell orders,

compared with the closest Oanda screen shot.

There are differences between the two, namely

My plot above is of 20 minute bars vs Oanda 15 minute bars
my open order histogram is fixed at the latest downloaded open order snapshot corresponding to the rightmost bar close, whereas the Oanda platform allows you to scroll back and see the order book in historical time

The 20 minute timeframe was deliberately chosen to match the maximum "refresh rate" of the API download availability and gives a nice, slightly different perspective to the 15 and 30 minute granularity provided on the Oanda platform.

Tuesday, 31 October 2017

Prepending Historical Data with Oanda's R API

As a follow on to my previous post, which was about appending data, the script below prepends historical data to an assumed existing data record.

% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data/hourly")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {  

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   current_ohlc_record = read.table( file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , header = FALSE , na = "" , sep = "," ,
                                      stringsAsFactors = FALSE )
   
   current_ohlc_record_begin_date_time = as.character( current_ohlc_record[ 1 , 1 ] ) % get the date/time value to be matched
   last_date_ix = as.Date( current_ohlc_record[ 1 , 1 ] )                             % the end date for new data to be downloaded             
   
   % last 40 weeks of hourly data approx = 5000 hourly bars            
   begin_date_ix = as.Date( last_date_ix - 280 )                      % the begin date for new data to be downloaded

   % download the missing historical data from begin_date_ix to last_date_x.
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          begin_date_ix , last_date_ix + 2 ) % +2 to ensure that the end of the new downloaded data will
                                                             % overlap with the beginning of current_ohlc_record
  
   % having ensured no data is missed by overlaping with the current_ohlc_record, delete duplicated OHLC information
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value
   
   ix = charmatch( current_ohlc_record_begin_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( ix : nrow( new_historical_data ) ) , ]
   
   % before prepending new_historical_data in front of current_ohlc_record, need to give names to current_ohlc_record as
   % rbind needs to bind by named attributes
   names( current_ohlc_record ) = names( new_historical_data )
   
   % see https://stackoverflow.com/questions/11785710/rbind-function-changing-my-entries for reason for following
   % also need to coerce that dates in new_historical_data from POSIXct to character
   new_historical_data$TimeStamp = as.character( new_historical_data$TimeStamp )
   
   % and now prepend new_historical_data to current_ohlc_record
   combined_records = rbind( new_historical_data , current_ohlc_record , stringsAsFactors = FALSE )
   
   % and coerce character dates back to a POSIXct date format prior to printing
   combined_records$TimeStamp = as.POSIXct( combined_records$TimeStamp )
   
   % write combined_records to file
   write.table( combined_records , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , 
                col.names = FALSE , sep = "," )
   
   added_data_length = nrow( new_historical_data ) % length of added new data

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
   
   % write updated Instrument_update_file to file
   write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE )

} % end of for all_current_historical_data_list loop

As described in the previous post the function HisPricesDates is called to do the actual downloading, with the relevant dates for function input being read and calculated from the existing data file ( I have hard coded for hourly data but this can, of course, be changed or implemented as user input in the R session). As usual I have commented the script to explain what is going on.

However, one important caveat is that it is assumed that there is actually some Oanda data to download and prepend prior the the earliest date in the existing record, and there are no checks of this assumption. Therefore, the script might fail in unexpected ways if one attempts to reach too far back in history for the prependable data.

Friday, 27 October 2017

Updating Historical Data Using Oanda's API and R

Following on from my previous post about downloading historical data, this post shows how previously downloaded data may be updated and appended with new, more recent data without having to re-download all the old data all over again.

The main function to do this, HisPricesDates, downloads data between given dates as function inputs and is shown below.

HisPricesDates  = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Start, End ){

% a typical Oanda API call might look like  
% https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD&granularity=D&start=2014-03-21&end=2014-04-21&candleFormat=midpoint&includeFirst=false
% which is slowly built up by using the R paste function, commented at end of each line below
  
  httpaccount  = "https://api-fxtrade.oanda.com"
  auth           = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec  = paste(httpaccount,"/v1/candles?instrument=",sep="") % https://api-fxtrade.oanda.com/v1/candles?instrument=
  QueryHistPrec1 = paste(QueryHistPrec,Instrument,sep="")              % https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD
  qstart = paste("start=",Start,sep="")                                % start=2014-03-21
  qend   = paste("end=",End,sep="")                                    % end=2014-04-21   
  qcandleFormat  = "candleFormat=midpoint"                             % candleFormat=midpoint
  qgranularity   = paste("granularity=",Granularity,sep="")            % granularity=D
  qdailyalignment    = paste("dailyAlignment=",DayAlign,sep="")        % dailyAlignment=0
  qincludeFirst = "includeFirst=false"                                 % includeFirst=false
  QueryHistPrec2 = paste(QueryHistPrec1,qgranularity,qstart,qend,qcandleFormat,qincludeFirst,qdailyalignment,sep="&")
  InstHistP = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices        = data.frame(InstHistPjson[[3]])
  Prices$time   = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}

This function is called by two R scripts, one for downloading daily data and one for intraday data.

The daily update script, which is shown next,

% cd to the daily data directory
setwd("~/Documents/octave/oanda_data/daily")

all_current_historical_data_list = read.table("instrument_daily_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {
  
  instrument = all_current_historical_data_list[ ii , 1 ]
  % read second column of dates in all_current_historical_data_list as a date index
  date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
  todays_date = as.Date( Sys.time() )
  
  % download the missing historical data from date_ix to todays_date, if and only if, date_ix != todays_date
  if( date_ix + 1 != todays_date ) {
  
  new_historical_data = HisPricesDates( Granularity = "D", DayAlign, TimeAlign, AccountToken, instrument,
                         date_ix , todays_date )

  % the new_historical_data might only try to add incomplete OHLC data, in which case do not actually
  % want to update, so only update if we will be adding new, complete OHLC information  
  if ( nrow( new_historical_data ) >= 2 & new_historical_data[ 2 , 7 ] == TRUE ) {
  
    % now do some data manipulation
    % expect date of last line in Instrument_update_file == date of first line in new_historical_data 
    if ( date_ix == as.Date( new_historical_data[ 1 , 1 ] ) ) { % this is the case if true
       new_historical_data = new_historical_data[ -1 , ]       % so delete first row of new_historical_data
    }
  
    % similarly, expect last line of new_historical_data to be an incomplete OHLC bar
    if ( new_historical_data[ nrow( new_historical_data) , 7 ] == FALSE) {         % if so,
       new_historical_data = new_historical_data[ -nrow( new_historical_data) , ] % delete this last line
    }
    
    % append new_historical_data to the relevant raw data file
    write.table( new_historical_data , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )
  
    added_data_length = nrow( new_historical_data )
    new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )
    
    % and amend Instrument_update file with lastest update information  
    all_current_historical_data_list[ ii , 2 ] = new_last_date
    all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
  
    } % end of download if statement
  
  } % end of ( date_ix != todays_date ) if statement
  
} % end of for all_current_historical_data_list loop

% Write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_daily_update_file" , row.names = FALSE , col.names = FALSE , na = "" )

has if statements as control structures to check that there is likely to be new daily data to actually download. It does this by checking a last_update date contained in an "instrument_daily_update_file" and comparing this with the current OS system time. If there is likely to be new data, the script runs and then updates this "instrument_daily_update_file." If not, the script exits with nothing having been done.

The intraday update script doe not have the checks the daily script has because I assume there will always be some new intraday data available for download. In this case, the last_update date is read from the "instrument_update_file" purely to act as an input to the above HisPricesDates function. As a result, this script involves some data manipulation to ensure that duplicate data is not printed to file. This script is shown next and is heavily commented to explain what is happening.

% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   % read second column of dates in all_current_historical_data_list as a date index
   date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
   
   todays_date = as.Date( Sys.time() )

   % download the missing historical data from date_ix to todays_date. If date_ix == todays_date, will download all
   % hourly bars for today only. 
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          date_ix , todays_date + 1 )

   % the new_historical_data will almost certainly have incomplete hourly OHLC data in its last line,
   % so delete this incomplete OHLC information
   if ( new_historical_data[ nrow( new_historical_data ) , 7 ] == FALSE ) {
        new_historical_data = new_historical_data[ -nrow( new_historical_data ) , ]
   }
   
   % read the last line only of the current OHLC file for this instrument
   file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) % get the filename
   
   system_command = paste( "tail -1" , file , sep = " " )     % create a unix system command to read the last line of this file
   
   % read the file's last line
   old_historical_data = read.csv( textConnection( system( system_command , intern = TRUE ) ) , header = FALSE , sep = "," ,
                                    stringsAsFactors = FALSE )
   
   old_historical_data_end_date_time = old_historical_data[ 1 , 1 ]            % get the date value to be matched 
   
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value

   ix = charmatch( old_historical_data_end_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( 1 : ix ) , ]
   
   % append new_historical_data to the relevant raw data file
   write.table( new_historical_data , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )

   added_data_length = nrow( new_historical_data )                          % length of added new data
   new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )  % date of last update

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 2 ] = new_last_date
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length

} % end of for all_current_historical_data_list loop

% finally, write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE , na = "" )

There is one important thing to point out on lines 29 to 33, which is that this section of code relies on a Unix based command, which in turn means that this almost certainly will not work on Windows based OSes. Windows users will have to find their own hack to load just the last line of the relevant file, or put up with loading the whole historical data file and indexing just the last line.

Thursday, 21 September 2017

Downloading Historical Data Using Oanda's API and R

It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data.

Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position.

My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio.

The first such R script is called to initialise the variables and load the various functions into the R environment, shown below.

# Required Packages in order to use the R API functions
library("downloader")
library("RCurl")
library("jsonlite")
library("httr")

# -- ---------------------------------------------------------------------------------- #
# -- Specific Values of Parameters in API for my primary account ---------------------- #
# -- ---------------------------------------------------------------------------------- #

# -- numeric ------ # My Account ID 
AccountID = 123456     
# -- character ---- # My Oanda Token
AccountToken = "blah-de-blah-de-blah"

# load the various function files
source("~/path/to/InstrumentsList.R")
source("~/path/to/R_Oanda_API_functions/ActualPrice.R")
source("~/path/to/HisPricesCount.R")
source("~/path/to/HisPricesDates.R")

# get list of all tradeable instruments
trade_instruments = InstrumentsList( AccountToken , AccountID )
View( trade_instruments )

# load some default values
# -- character ---- # Granularity of the prices
# granularity: The time range represented by each candlestick. The value specified will determine the alignment 
# of the first candlestick.
# choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month )
Granularity = "D"

# -- numeric ------ # Hour of the "End of the Day"
# dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. 
# The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be 
# an integer between 0 and 23.
DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York"

# -- character ---- # Time Zone in format "Continent/Zone
# alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the 
# returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by 
# the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at 
# http://developer.oanda.com/docs/timezones.txt
# "America%2FMexico_City" was the originallly provided, but could use, for example,  "Europe%2FLondon" or "Europe%2FWarsaw"
TimeAlign = "Europe%2FLondon"

################################# IMPORTANT NOTE #####################################################################
# By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures
# that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is 
# Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer
# daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT.
######################################################################################################################

Start = Sys.time() # Current system time
End = Sys.time() # Current system time
Count = 500 # Oanda default

# now cd to the working directory
setwd("~/path/to/oanda_data")

The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next.

InstrumentsList = function( AccountToken , AccountID )
{
  httpaccount = "https://api-fxtrade.oanda.com"
  auth        = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  Queryhttp   = paste(httpaccount,"/v1/instruments?accountId=",sep="")
  QueryInst   = paste(Queryhttp,AccountID,sep="")
  QueryInst1  = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstJson    = fromJSON(QueryInst1, simplifyDataFrame = TRUE)
  FinalData   = data.frame(InstJson)
  colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits")
  FinalData$MaxTradeUnits = as.numeric(FinalData$MaxTradeUnits)
  return(FinalData)
}

This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also.

# cd to the working directory
setwd("~/path/to/oanda_data")

# dataframe to keep track of updates
Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE )

for( ii in 1 : nrow( trade_instruments ) ) {
  
  instrument = trade_instruments[ ii , 1 ]
  
  # write details of instrument to Instrument_update_file
  Instrument_update_file[ ii , 1 ] = instrument
  
  historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 )
  last_row_ix = nrow( historical_prices )
  
  if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete
    # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data
    # to Instrument_update_file
    Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] )
    
    # and delete the row with these incomplete values
    historical_prices = historical_prices[ 1 : last_row_ix - 1 , ]
    
  } # end of if statement
  
  # Write historical_prices to file
  write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , 
             col.names = FALSE , sep = "," )

  } # end of for loop

  # Write Instrument_update_file to file
  write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," )

This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is

HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){
  
  httpaccount      = "https://api-fxtrade.oanda.com"
  auth             = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec    = paste(httpaccount,"/v1/candles?instrument=",sep="")
  QueryHistPrec1   = paste(QueryHistPrec,Instrument,sep="")
  qcount           = paste("count=",Count,sep="")
  qcandleFormat    = "candleFormat=midpoint"
  qgranularity     = paste("granularity=",Granularity,sep="")
  qdailyalignment  = paste("dailyAlignment=",DayAlign,sep="")
  QueryHistPrec2   = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&")
  InstHistP        = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson    = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices           = data.frame(InstHistPjson[[3]])
  Prices$time      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}

One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.

Saturday, 30 June 2012

Machine Learning Course Completed

I'm pleased to say that I have now completed Andrew Ng's machine learning course, which is offered through Coursera. This post is not intended to be a review of the course, which in my opinion is extremely good and very useful, but more of a reflection of my thoughts and what I think will be useful for me personally.

Firstly, I was pleasantly surprised that the software/programming language of instruction was Octave, which regular readers of this blog will know is my main software of choice. Apart from learning the concepts of ML, I also picked up some handy tips for Octave programming, and more importantly for me I now have a set of working Octave ML functions that I can use immediately in my system development.

In my previous post I mentioned that my first attempt at using ML will be to use a Neural Net to classify market types. As background to this, readers might be interested in a pdf file of the video lectures, available from here, which was put together and posted on the course discussion forum by another student - I think this is very good and all credit to said student, José Soares Augusto.

Due to the honour code ( or honor code for American readers ) of the course I will be unable to post the code that I wrote for the programming assignments. However, I do feel that I can post the code shown in the code box below, as the copyright notice allows it. A few slight changes I made are noted in the copyright notice. This is a minimisation function that was used in the training of the Neural Net assignment and was provided in the assignment download.

function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
% Minimize a continuous differentialble multivariate function. Starting point
% is given by "X" (D by 1), and the function named in the string "f", must
% return a function value and a vector of partial derivatives. The Polack-
% Ribiere flavour of conjugate gradients is used to compute search directions,
% and a line search using quadratic and cubic polynomial approximations and the
% Wolfe-Powell stopping criteria is used together with the slope ratio method
% for guessing initial step sizes. Additionally a bunch of checks are made to
% make sure that exploration is taking place and that extrapolation will not
% be unboundedly large. The "length" gives the length of the run: if it is
% positive, it gives the maximum number of line searches, if negative its
% absolute gives the maximum allowed number of function evaluations. You can
% (optionally) give "length" a second component, which will indicate the
% reduction in function value to be expected in the first line-search (defaults
% to 1.0). The function returns when either its length is up, or if no further
% progress can be made (ie, we are at a minimum, or so close that due to
% numerical problems, we cannot get any closer). If the function terminates
% within a few iterations, it could be an indication that the function value
% and derivatives are not consistent (ie, there may be a bug in the
% implementation of your "f" function). The function returns the found
% solution "X", a vector of function values "fX" indicating the progress made
% and "i" the number of iterations (line searches or function evaluations,
% depending on the sign of "length") used.
%
% Usage: [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
%
% See also: checkgrad 
%
% Copyright (C) 2001 and 2002 by Carl Edward Rasmussen. Date 2002-02-13
%
% (C) Copyright 1999, 2000 & 2001, Carl Edward Rasmussen
% 
% Permission is granted for anyone to copy, use, or modify these
% programs and accompanying documents for purposes of research or
% education, provided this copyright notice is retained, and note is
% made of any changes that have been made.
% 
% These programs and documents are distributed without any warranty,
% express or implied.  As the programs were written for research
% purposes only, they have not been tested to the degree that would be
% advisable in any important application.  All use of these programs is
% entirely at the user's own risk.
%
% [ml-class] Changes Made:
% 1) Function name and argument specifications
% 2) Output display
%
% Dekalog Changes Made:
% Some lines have been altered, changing | to || and & to &&.
% This is to avoid "possible Matlab-style short-circuit operator" warnings
% being given when code is run under Octave. The lines where these changes
% have been made are indicated by comments at the end of each respective line. 

% Read options
if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter')
    length = options.MaxIter;
else
    length = 100;
end


RHO = 0.01;                            % a bunch of constants for line searches
SIG = 0.5;       % RHO and SIG are the constants in the Wolfe-Powell conditions
INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                    % extrapolate maximum 3 times the current bracket
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 100;                                      % maximum allowed slope ratio

argstr = ['feval(f, X'];                      % compose string used to call function
for i = 1:(nargin - 3)
  argstr = [argstr, ',P', int2str(i)];
end
argstr = [argstr, ')'];

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
S=['Iteration '];

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
fX = [];
[f1 df1] = eval(argstr);                      % get function value and gradient
i = i + (length<0);                                            % count epochs?!
s = -df1;                                        % search direction is steepest
d1 = -s'*s;                                                 % this is the slope
z1 = red/(1-d1);                                  % initial step is red/(|s|+1)

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; f0 = f1; df0 = df1;                   % make a copy of current values
  X = X + z1*s;                                             % begin line search
  [f2 df2] = eval(argstr);
  i = i + (length<0);                                          % count epochs?!
  d2 = df2'*s;
  f3 = f1; d3 = d1; z3 = -z1;             % initialize point 3 equal to point 1
  if length>0, M = MAX; else M = min(MAX, -length-i); end
  success = 0; limit = -1;                     % initialize quanteties
  while 1
    while ((f2 > f1+z1*RHO*d1) || (d2 > -SIG*d1)) && (M > 0) % | and & changed to || and && to avoid "possible Matlab-style short-circuit operator" warning 
      limit = z1;                                         % tighten the bracket
      if f2 > f1
        z2 = z3 - (0.5*d3*z3*z3)/(d3*z3+f2-f3);                 % quadratic fit
      else
        A = 6*(f2-f3)/z3+3*(d2+d3);                                 % cubic fit
        B = 3*(f3-f2)-z3*(d3+2*d2);
        z2 = (sqrt(B*B-A*d2*z3*z3)-B)/A;       % numerical error possible - ok!
      end
      if isnan(z2) || isinf(z2)     % | changed to || to avoid "possible Matlab-style short-circuit operator" warning 
        z2 = z3/2;                  % if we had a numerical problem then bisect
      end
      z2 = max(min(z2, INT*z3),(1-INT)*z3);  % don't accept too close to limits
      z1 = z1 + z2;                                           % update the step
      X = X + z2*s;
      [f2 df2] = eval(argstr);
      M = M - 1; i = i + (length<0);                           % count epochs?!
      d2 = df2'*s;
      z3 = z3-z2;                    % z3 is now relative to the location of z2
    end
    if f2 > f1+z1*RHO*d1 || d2 > -SIG*d1                    % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      break;                                                % this is a failure
    elseif d2 > SIG*d1
      success = 1; break;                                             % success
    elseif M == 0
      break;                                                          % failure
    end
    A = 6*(f2-f3)/z3+3*(d2+d3);                      % make cubic extrapolation
    B = 3*(f3-f2)-z3*(d3+2*d2);
    z2 = -d2*z3*z3/(B+sqrt(B*B-A*d2*z3*z3));        % num. error possible - ok!
    if ~isreal(z2) || isnan(z2) || isinf(z2) || z2 < 0   % num prob or wrong sign? % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      if limit < -0.5                               % if we have no upper limit
        z2 = z1 * (EXT-1);                 % the extrapolate the maximum amount
      else
        z2 = (limit-z1)/2;                                   % otherwise bisect
      end
    elseif (limit > -0.5) && (z2+z1 > limit)       % extraplation beyond max?   % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = (limit-z1)/2;                                               % bisect
    elseif (limit < -0.5) && (z2+z1 > z1*EXT)      % extrapolation beyond limit % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = z1*(EXT-1.0);                           % set to extrapolation limit
    elseif z2 < -z3*INT
      z2 = -z3*INT;
    elseif (limit > -0.5) && (z2 < (limit-z1)*(1.0-INT))   % too close to limit? % & changed to && to avoid "possible Matlab-style short-circuit operator" warning
      z2 = (limit-z1)*(1.0-INT);
    end
    f3 = f2; d3 = d2; z3 = -z2;                  % set point 3 equal to point 2
    z1 = z1 + z2; X = X + z2*s;                      % update current estimates
    [f2 df2] = eval(argstr);
    M = M - 1; i = i + (length<0);                             % count epochs?!
    d2 = df2'*s;
  end                                                      % end of line search

  if success                                         % if line search succeeded
    f1 = f2; fX = [fX' f1]';
    fprintf('%s %4i | Cost: %4.6e\r', S, i, f1);
    s = (df2'*df2-df1'*df2)/(df1'*df1)*s - df2;      % Polack-Ribiere direction
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    d2 = df1'*s;
    if d2 > 0                                      % new slope must be negative
      s = -df1;                              % otherwise use steepest direction
      d2 = -s'*s;    
    end
    z1 = z1 * min(RATIO, d1/(d2-realmin));          % slope ratio but max RATIO
    d1 = d2;
    ls_failed = 0;                              % this line search did not fail
  else
    X = X0; f1 = f0; df1 = df0;  % restore point from before failed line search
    if ls_failed || i > abs(length)         % line search failed twice in a row % | changed to || to avoid "possible Matlab-style short-circuit operator" warning
      break;                             % or we ran out of time, so we give up
    end
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    s = -df1;                                                    % try steepest
    d1 = -s'*s;
    z1 = 1/(1-d1);                     
    ls_failed = 1;                                    % this line search failed
  end
  if exist('OCTAVE_VERSION')
    fflush(stdout);
  end
end
fprintf('\n');

Finally, the last set of videos talked about "Artificial Data Synthesis," otherwise known as creating your own data for training purposes. This is basically what I had planned to do anyway ( see previous post ), but it is nice to learn that it is standard, accepted practice in the ML world. The first such way of creating data, in the context of Photo OCR, is shown below

where various font libraries are used against random backgrounds. I think this very much mirrors my planned approach of training on repeated sets of my "ideal time series" construct. However, another approach which could be used is "data distortion," shown in this next image

which is an approach that my creating synthetic data using FFT might be useful for, or alternatively a correlation and cointegration approach as shown in R code in this Quantitative Finance thread.

All in all, I'm quite excited by the possibilities of my new found knowledge, and I fully expect that in time, after development and testing, any Neural Net I develop will in fact replace my current Naive Bayesian classifier.

Wednesday, 6 June 2012

Creation of a Simple Benchmark Suite

For some time now I have been toying with the idea of creating a simple benchmark suite to compare my own back test system performance with that of some public domain trading systems. I decided to select a few examples from the Trading Blox Users' Guide, specifically:

exponential moving average crossovers of periods 10-20 and 20-50
triple moving average crossover system with periods 10-20-50
Bollinger band breakouts of periods 20 and 50 with 1 & 2 standard deviations for exits and entries
donchian channel breakouts with periods 20-10 and 50-25

This is a total of 7 systems, and in the Rcpp code below these form a sort of "committee" to vote to be either long/short 1 contract, or neutral.

# This function takes as inputs vectors of opening and closing prices
# and creates a basic benchmark output suite of system equity curves 
# for the following basic trend following systems
# exponential moving average crossovers of 10-20 and 20-50
# triple moving average crossovers of 10-20-50
# bollinger band breakouts of 20 and 50 with 1 & 2 standard deviations
# donchian channel breakouts of 20-10 and 50-25
# The libraries required to compile this function are
# "Rcpp," "inline" and "compiler." The function is compiled by the command
# > source("basic_benchmark_equity.r") in the console/terminal.

library(Rcpp) # load the required library
library(inline) # load the required library
library(compiler) # load the required library

src <- '
#include 
#include 

Rcpp::NumericVector open(a) ;
Rcpp::NumericVector close(b) ;
Rcpp::NumericVector market_mode(c) ;
Rcpp::NumericVector kalman(d) ;
Rcpp::NumericVector tick_size(e) ;
Rcpp::NumericVector tick_value(f) ;
int n = open.size() ;
Rcpp::NumericVector sma_10(n) ;
Rcpp::NumericVector sma_20(n) ;
Rcpp::NumericVector sma_50(n) ;
Rcpp::NumericVector std_20(n) ;
Rcpp::NumericVector std_50(n) ;

// create equity output vectors
Rcpp::NumericVector market_mode_long_eq(n) ;
Rcpp::NumericVector market_mode_short_eq(n) ;
Rcpp::NumericVector market_mode_composite_eq(n) ;
Rcpp::NumericVector sma_10_20_eq(n) ; 
Rcpp::NumericVector sma_20_50_eq(n) ; 
Rcpp::NumericVector tma_eq(n) ; 
Rcpp::NumericVector bbo_20_eq(n) ;
Rcpp::NumericVector bbo_50_eq(n) ;
Rcpp::NumericVector donc_20_eq(n) ;
Rcpp::NumericVector donc_50_eq(n) ;
Rcpp::NumericVector composite_eq(n) ; 

// position vectors for benchmark systems
Rcpp::NumericVector sma_10_20_pv(1) ;
sma_10_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector sma_20_50_pv(1) ;
sma_20_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector tma_pv(1) ;
tma_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector bbo_20_pv(1) ;
bbo_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector bbo_50_pv(1) ;
bbo_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector donc_20_pv(1) ;
donc_20_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector donc_50_pv(1) ;
donc_50_pv[0] = 0.0 ; // initialise to zero, no position

Rcpp::NumericVector comp_pv(1) ;
comp_pv[0] = 0.0 ; // initialise to zero, no position

// fill the equity curve vectors with zeros for "burn in" period
// and create the initial values for all indicators

for ( int ii = 0 ; ii < 50 ; ii++ ) {
    
    if ( ii >= 40 ) {
    sma_10[49] += close[ii] ; }

    if ( ii >= 30 ) {
    sma_20[49] += close[ii] ; }

    sma_50[49] += close[ii] ;

    std_20[ii] = 0.0 ;
    std_50[ii] = 0.0 ;

    market_mode_long_eq[ii] = 0.0 ;
    market_mode_short_eq[ii] = 0.0 ;
    market_mode_composite_eq = 0.0 ;
    sma_10_20_eq[ii] = 0.0 ;
    sma_20_50_eq[ii] = 0.0 ; 
    bbo_20_eq[ii] = 0.0 ;
    bbo_50_eq[ii] = 0.0 ;
    tma_eq[ii] = 0.0 ;
    donc_20_eq[ii] = 0.0 ;
    donc_50_eq[ii] = 0.0 ; 
    composite_eq[ii] = 0.0 ; } // end of initialising loop

    sma_10[49] = sma_10[49] / 10.0 ;
    sma_20[49] = sma_20[49] / 20.0 ;
    sma_50[49] = sma_50[49] / 50.0 ;

// the main calculation loop
for ( int ii = 50 ; ii < n-2 ; ii++ ) {

    // calculate the smas
    sma_10[ii] = ( sma_10[ii-1] - sma_10[ii-10] / 10.0 ) + ( close[ii] / 10.0 ) ;
    sma_20[ii] = ( sma_20[ii-1] - sma_20[ii-20] / 20.0 ) + ( close[ii] / 20.0 ) ;
    sma_50[ii] = ( sma_50[ii-1] - sma_50[ii-50] / 50.0 ) + ( close[ii] / 50.0 ) ;

      // calculate the standard deviations
      for ( int jj = 0 ; jj < 50 ; jj++ ) {
    
      if ( jj < 20 ) {
      std_20[ii] += ( close[ii-jj] - sma_20[ii] ) * ( close[ii-jj] - sma_20[ii] )  ; } // end of jj if

      std_50[ii] += ( close[ii-jj] - sma_50[ii] ) * ( close[ii-jj] - sma_50[ii] ) ; } // end of standard deviation loop

    std_20[ii] = sqrt( std_20[ii] / 20.0 ) ;
    std_50[ii] = sqrt( std_50[ii] / 50.0 ) ;

    //-------------------------------------------------------------------------------------------------------------------

    // calculate the equity values of the market modes
    // market_mode uwr and unr long signals
    if ( market_mode[ii] == 1 || market_mode[ii] == 2 ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; }

    // market_mode dwr and dnr short signals
    if ( market_mode[ii] == 3 || market_mode[ii] == 4 ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; }

    // calculate the equity values of the market modes
    // market_mode cyc long signals
    if ( market_mode[ii] == 0 && kalman[ii] > kalman[ii-1] ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; }

    // market_mode cyc short signals
    if ( market_mode[ii] == 0 && kalman[ii] < kalman[ii-1] ) {
    market_mode_long_eq[ii] = market_mode_long_eq[ii-1] ;
    market_mode_short_eq[ii] = market_mode_short_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ;
    market_mode_composite_eq[ii] = market_mode_composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; }

    //----------------------------------------------------------------------------------------------------------------------------

    // calculate the equity values and positions of each benchmark system
    // sma_10_20_eq
    if ( sma_10[ii] > sma_20[ii] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_10_20_pv[0] = 1.0 ; } // long

    // sma_10_20_eq
    if ( sma_10[ii] < sma_20[ii] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_10_20_pv[0] = -1.0 ; } // short

    // sma_10_20_eq
    if ( sma_10[ii] == sma_20[ii] && sma_10[ii-1] > sma_20[ii-1] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_10_20_pv[0] = 1.0 ; } // long

    // sma_10_20_eq
    if ( sma_10[ii] == sma_20[ii] && sma_10[ii-1] < sma_20[ii-1] ) { 
    sma_10_20_eq[ii] = sma_10_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_10_20_pv[0] = -1.0 ; } // short

    //-----------------------------------------------------------------------------------------------------------

    // sma_20_50_eq
    if ( sma_20[ii] > sma_50[ii] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_20_50_pv[0] = 1.0 ; } // long

    // sma_20_50_eq
    if ( sma_20[ii] < sma_50[ii] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_20_50_pv[0] = -1.0 ; } // short

    // sma_20_50_eq
    if ( sma_20[ii] == sma_50[ii] && sma_20[ii-1] > sma_50[ii-1] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
    sma_20_50_pv[0] = 1.0 ; } // long

    // sma_20_50_eq
    if ( sma_20[ii] == sma_50[ii] && sma_20[ii-1] < sma_50[ii-1] ) { 
    sma_20_50_eq[ii] = sma_20_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
    sma_20_50_pv[0] = -1.0 ; } // short

    //-----------------------------------------------------------------------------------------------------------

    // tma_eq
    if ( tma_pv[0] == 0.0 ) {

      // default position
      tma_eq[ii] = tma_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( sma_10[ii] > sma_20[ii] && sma_20[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      tma_pv[0] = 1.0 ; } // long

      if ( sma_10[ii] < sma_20[ii] && sma_20[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      tma_pv[0] = -1.0 ; } // short

    } // end of tma_pv == 0.0 loop

    if ( tma_pv[0] == 1.0 ) {

      // default long position
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( sma_10[ii] < sma_20[ii] && sma_10[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] ; 
      tma_pv[0] = 0.0 ; } // exit long, go neutral

      if ( sma_10[ii] < sma_20[ii] && sma_20[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      tma_pv[0] = -1.0 ; } // short
    
    } // end of tma_pv == 1.0 loop

    if ( tma_pv[0] == -1.0 ) {

      // default short position
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( sma_10[ii] > sma_20[ii] && sma_10[ii] < sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] ; 
      tma_pv[0] = 0.0 ; } // exit short, go neutral

      if ( sma_10[ii] > sma_20[ii] && sma_20[ii] > sma_50[ii] ) { 
      tma_eq[ii] = tma_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      tma_pv[0] = 1.0 ; } // long

    } // end of tma_pv == -1.0 loop

    //------------------------------------------------------------------------------------------------------------

    // bbo_20_eq
    if ( bbo_20_pv[0] == 0.0 ) {

      // default position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_20_pv[0] = 1.0 ; } // long

      if ( close[ii] < sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_20_pv[0] = -1.0 ; } // short

    } // end of bbo_20_pv == 0.0 loop

    if ( bbo_20_pv[0] == 1.0 ) {

      // default long position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < sma_20[ii] + std_20[ii] && close[ii] > sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ; 
      bbo_20_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < sma_20[ii] - 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_20_pv[0] = -1.0 ; } // short
    
    } // end of bbo_20_pv == 1.0 loop

    if ( bbo_20_pv[0] == -1.0 ) {

      // default short position
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > sma_20[ii] - std_20[ii] && close[ii] < sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] ; 
      bbo_20_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > sma_20[ii] + 2.0 * std_20[ii] ) { 
      bbo_20_eq[ii] = bbo_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_20_pv[0] = 1.0 ; } // long

    } // end of bbo_20_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // bbo_50_eq
    if ( bbo_50_pv[0] == 0.0 ) {

      // default position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_50_pv[0] = 1.0 ; } // long

      if ( close[ii] < sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_50_pv[0] = -1.0 ; } // short

    } // end of bbo_50_pv == 0.0 loop

    if ( bbo_50_pv[0] == 1.0 ) {

      // default long position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < sma_50[ii] + std_50[ii] && close[ii] > sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ; 
      bbo_50_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < sma_50[ii] - 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      bbo_50_pv[0] = -1.0 ; } // short
    
    } // end of bbo_50_pv == 1.0 loop

    if ( bbo_50_pv[0] == -1.0 ) {

      // default short position
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > sma_50[ii] - std_50[ii] && close[ii] < sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] ; 
      bbo_50_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > sma_50[ii] + 2.0 * std_50[ii] ) { 
      bbo_50_eq[ii] = bbo_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      bbo_50_pv[0] = 1.0 ; } // long

    } // end of bbo_50_pv == -1.0 loop

    //-----------------------------------------------------------------------------------------------------

    // donc_20_eq
    if ( donc_20_pv[0] == 0.0 ) {

      // default position
      donc_20_eq[ii] = donc_20_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_20_pv[0] = 1.0 ; } // long

      if ( close[ii] < *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_20_pv[0] = -1.0 ; } // short

    } // end of donc_20_pv == 0.0 loop

    if ( donc_20_pv[0] == 1.0 ) {

      // default long position
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < *std::min_element( &close[ii-10], &close[ii] ) && close[ii] > *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] ; 
      donc_20_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < *std::min_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_20_pv[0] = -1.0 ; } // short
    
    } // end of donc_20_pv == 1.0 loop

    if ( donc_20_pv[0] == -1.0 ) {

      // default short position
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-10], &close[ii] ) && close[ii] < *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] ; 
      donc_20_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > *std::max_element( &close[ii-20], &close[ii] ) ) { 
      donc_20_eq[ii] = donc_20_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_20_pv[0] = 1.0 ; } // long

    } // end of donc_20_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // donc_50_eq
    if ( donc_50_pv[0] == 0.0 ) {

      // default position
      donc_50_eq[ii] = donc_50_eq[ii-1] ;

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_50_pv[0] = 1.0 ; } // long

      if ( close[ii] < *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_50_pv[0] = -1.0 ; } // short

    } // end of donc_50_pv == 0.0 loop

    if ( donc_50_pv[0] == 1.0 ) {

      // default long position
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; // long

      // unless one of the two following conditions is true

      if ( close[ii] < *std::min_element( &close[ii-25], &close[ii] ) && close[ii] > *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] ; 
      donc_50_pv[0] = 0.0 ; } // exit long, go neutral

      if ( close[ii] < *std::min_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; 
      donc_50_pv[0] = -1.0 ; } // short
    
    } // end of donc_50_pv == 1.0 loop

    if ( donc_50_pv[0] == -1.0 ) {

      // default short position
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; // short

      // unless one of the two following conditions is true

      if ( close[ii] > *std::max_element( &close[ii-25], &close[ii] ) && close[ii] < *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] ; 
      donc_50_pv[0] = 0.0 ; } // exit short, go neutral

      if ( close[ii] > *std::max_element( &close[ii-50], &close[ii] ) ) { 
      donc_50_eq[ii] = donc_50_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; 
      donc_50_pv[0] = 1.0 ; } // long

    } // end of donc_50_pv == -1.0 loop

    //-------------------------------------------------------------------------------------------------

    // composite_eq
    comp_pv[0] = sma_10_20_pv[0] + sma_20_50_pv[0] + tma_pv[0] + bbo_20_pv[0] + bbo_50_pv[0] + donc_20_pv[0] + donc_50_pv[0] ;
    
    if ( comp_pv[0] > 0 ) {
    composite_eq[ii] = composite_eq[ii-1] + tick_value[0] * ( (open[ii+2]-open[ii+1])/tick_size[0] ) ; } // long

    if ( comp_pv[0] < 0 ) {
    composite_eq[ii] = composite_eq[ii-1] + tick_value[0] * ( (open[ii+1]-open[ii+2])/tick_size[0] ) ; } // short 

    if ( comp_pv[0] == 0 ) {
    composite_eq[ii] = composite_eq[ii-1] ; } // neutral 

} // end of main for loop

// Now fill in the last two spaces in the equity vectors
market_mode_long_eq[n-1] = market_mode_long_eq[n-3] ;
market_mode_long_eq[n-2] = market_mode_long_eq[n-3] ;

market_mode_short_eq[n-1] = market_mode_short_eq[n-3] ;
market_mode_short_eq[n-2] = market_mode_short_eq[n-3] ;

market_mode_composite_eq[n-1] = market_mode_composite_eq[n-3] ;
market_mode_composite_eq[n-2] = market_mode_composite_eq[n-3] ;

sma_10_20_eq[n-1] = sma_10_20_eq[n-3] ;
sma_10_20_eq[n-2] = sma_10_20_eq[n-3] ;

sma_20_50_eq[n-1] = sma_20_50_eq[n-3] ;
sma_20_50_eq[n-2] = sma_20_50_eq[n-3] ;

tma_eq[n-1] = tma_eq[n-3] ;
tma_eq[n-2] = tma_eq[n-3] ;

bbo_20_eq[n-1] = bbo_20_eq[n-3] ;
bbo_20_eq[n-2] = bbo_20_eq[n-3] ;

bbo_50_eq[n-1] = bbo_50_eq[n-3] ;
bbo_50_eq[n-2] = bbo_50_eq[n-3] ;

donc_20_eq[n-1] = donc_20_eq[n-3] ;
donc_20_eq[n-2] = donc_20_eq[n-3] ;

donc_50_eq[n-1] = donc_50_eq[n-3] ;
donc_50_eq[n-2] = donc_50_eq[n-3] ;

composite_eq[n-1] = composite_eq[n-3] ;
composite_eq[n-2] = composite_eq[n-3] ;

return List::create(
  _["market_mode_long_eq"] = market_mode_long_eq ,
  _["market_mode_short_eq"] = market_mode_short_eq ,
  _["market_mode_composite_eq"] = market_mode_composite_eq ,
  _["sma_10_20_eq"] = sma_10_20_eq ,
  _["sma_20_50_eq"] = sma_20_50_eq , 
  _["tma_eq"] = tma_eq ,
  _["bbo_20_eq"] = bbo_20_eq ,
  _["bbo_50_eq"] = bbo_50_eq ,
  _["donc_20_eq"] = donc_20_eq ,
  _["donc_50_eq"] = donc_50_eq ,
  _["composite_eq"] = composite_eq ) ; '

basic_benchmark_equity <- cxxfunction(signature(a = "numeric", b = "numeric", c = "numeric",
                                d = "numeric", e = "numeric", f = "numeric"), body=src, 
                                plugin = "Rcpp")

This Rcpp function is then called thus, using Rstudio

library(xts)  # load the required library

# load the "indicator" file 
data <- read.csv(file="usdyenind",head=FALSE,sep=,)
tick_size <- 0.01
tick_value <- 12.50

# extract other vectors of interest
open <- data[,2]
close <- data[,5]
market_mode <- data[,228]
kalman <- data[,283]

results <- basic_benchmark_equity(open,close,market_mode,kalman,tick_size,tick_value)

# coerce the above results list object to a data frame object
results_df <- data.frame( results )
df_max <- max(results_df) # for scaling of results plot
df_min <- min(results_df) # for scaling of results plot

# and now create an xts object for plotting
results_xts <- xts(results_df,as.Date(data[,'V1']))

# a nice plot of the results_xts object
par(col="#0000FF")
plot(results_xts[,'market_mode_long_eq'],main="USDYEN Pair",ylab="$ Equity Value",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#B0171F")
plot(results_xts[,'market_mode_short_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#00FF00")
plot(results_xts[,'market_mode_composite_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="#808080")
plot(results_xts[,'sma_10_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'sma_20_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'tma_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'bbo_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'bbo_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'donc_20_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE)
plot(results_xts[,'donc_50_eq'],main="",ylim=c(df_min,df_max),type="l")
par(new=TRUE,col="black")
plot(results_xts[,'composite_eq'],main="",ylim=c(df_min,df_max),type="l")

to output .png files which I have strung together in this video
Non-embedded view here.
The light grey equity curves are the individual curves for the benchmark systems and the black is the "committee" 1 contract equity curve. Also shown are the long and short 1 contract equity curves ( blue and red respectively ), along with a green combined equity curve for these, for my Naive Bayesian Classifier following the simple rules

be long 1 contract if the market type is uwr, unr or cyclic with my Kalman filter pointing upwards
be short 1 contract if the market type is dwr, dnr or cyclic with my Kalman filter pointing downwards

Again this is just a toy example of the use of my Bayesian Classifier. After I have completed Andrew Ng's machine learning course, as mentioned in my previous post, I will have a go at coding a neural net which may end up replacing the Bayesian Classifier.

Pages