Dekalog Blog: April 2014

Wednesday 16 April 2014

Effect Size of Cauchy-Schwarz Matching Algorithm

In my last post I talked about using the Cauchy-Schwarz Inequality to match similar periods of price history to one another. This post is about the more rigorous testing of this idea.

I decided to use the Effect size as the test of choice, for which there are nice introductions here and here. A basic description of the way I implemented the test is as follows:-

Randomly pick a section of price history, which will be used as the price history for the selection algorithm to match
Take the 5 consecutive bars immediately following the above section of price history and store as the "target"
Create a control group of random matches to the above "target" by randomly selecting 10 separate 5 bar pieces of price history and calculating the Cauchy-Schwarz values of these 10 compared to the target and record the average value of these values. Repeat this step N times to create a distribution of randomly matched, average target-to-random-price Cauchy-Schwarz values. By virtue of the Central limit theorem it can be expected that this distribution is approximately normal
Using the matching algorithm (as described in the previous post) get the closest 10 matches in the price history to the random selection from step 1
Get the 5 consecutive bars immediately following the 10 matches from step 4 and calculate their Cauchy-Schwarz values viz-a-viz the "target" and record the average value of these 10 values. This average value is the "experimental" value
Using the mean and standard deviation of the control group distribution from step 3, calculate the effect size of the experimental value and record this effect size value
Repeat all the above steps M times to form an effect size value distribution

The basic premise being tested here is that patterns, to some degree, repeat and that they have some predictive value for immediately following price bars. The test statistic being used is the Cauchy-Schwarz value itself, whereby a high value indicates a close similarity in price pattern, and hence predictability. The actual effect size test is the difference between means. The code to implement this test is given in the code box below, and is basically an extension of the code in my previous post.

clear all

% load price file of interest
filename = input( 'Enter filename for prices, e.g. es or esmatrix: ' , 's' ) ;
data = load( "-ascii" , filename ) ;

% get tick size
switch filename

case { "cc" }
tick = 1 ;

case { "gc" "lb" "pl" "sm" "sp" }
tick = 0.1 ;

case { "ausyen" "bo" "cl" "ct" "dx" "euryen" "gbpyen" "sb" "usdyen" }
tick = 0.01 ;

case { "c" "ng" }
tick = 0.001 ;

case { "auscad" "aususd" "euraus" "eurcad" "eurchf" "eurgbp" "eurusd" "gbpchf" "gbpusd" "ho" "rb" "usdcad" "usdchf" }
tick = 0.0001 ;

case { "c" "o" "s" "es" "nd" "w" }
tick = 0.25 ;

case { "fc" "lc" "lh" "pb" }
tick = 0.025 ;

case { "ed" }
tick = 0.0025 ;

case { "si" }
tick = 0.5 ;

case { "hg" "kc" "oj" "pa" }
tick = 0.05 ;

case { "ty" "us" }
tick = 0.015625 ;

case { "ccmatrix" }
tick = 1 ;

case { "gcmatrix" "lbmatrix" "plmatrix" "smmatrix" "spmatrix" }
tick = 0.1 ;

case { "ausyenmatrix" "bomatrix" "clmatrix" "ctmatrix" "dxmatrix" "euryenmatrix" "gbpyenmatrix" "sbmatrix" "usdyenmatrix" }
tick = 0.01 ;

case { "cmatrix" "ngmatrix" }
tick = 0.001 ;

case { "auscadmatrix" "aususdmatrix" "eurausmatrix" "eurcadmatrix" "eurchfmatrix" "eurgbpmatrix" "eurusdmatrix" "gbpchfmatrix" "gbpusdmatrix" "homatrix" "rbmatrix" "usdcadmatrix" "usdchfmatrix" }
tick = 0.0001 ;

case { "cmatrix" "omatrix" "smatrix" "esmatrix" "ndmatrix" "wmatrix" }
tick = 0.25 ;

case { "fcmatrix" "lcmatrix" "lhmatrix" "pbmatrix" }
tick = 0.025 ;

case { "edmatrix" }
tick = 0.0025 ;

case { "simatrix" }
tick = 0.5 ;

case { "hgmatrix" "kcmatrix" "ojmatrix" "pamatrix" }
tick = 0.05 ;

case { "tymatrix" "usmatrix" }
tick = 0.015625 ;

endswitch

open = data( : , 4 ) ;
high = data( : , 5 ) ;
low = data( : , 6 ) ;
close = data( : , 7 ) ;
price = vwap( open, high, low, close, tick ) ;

clear -exclusive price tick

% first, get the lookback parameters on real prices
[ sine, sinelead, period ] = sinewave_indicator( price ) ;
[ max_price, min_price, channel_price ] = adaptive_lookback_max_min( price, period, tick ) ;
smooth_price = smooth_2_5( price ) ;
[ max_smooth_price, min_smooth_price, smooth_channel_price ] = adaptive_lookback_max_min( smooth_price, period, tick ) ;

cauchy_schwarz_values = zeros( size(channel_price,1) , 1 ) ;
cauchy_schwarz_values_smooth = zeros( size(channel_price,1) , 1 ) ;

% set up all recording vectors
N = 10 ; % must be >= 10

% record these values
matches_values = zeros( N, 1 ) ;
matches_smooth_values = zeros( N, 1 ) ;
distcorr_values = zeros( N, 1 ) ;
distcorr_values_smooth = zeros( N, 1 ) ;

% vectors to record averages
random_matches_values_averages = zeros( 750, 1 ) ;
random_matches_smooth_values_averages = zeros( 750, 1 ) ;
random_distcorr_averages = zeros( 750, 1 ) ;
random_distcorr_smooth_averages = zeros( 750, 1 ) ;

% effect size vectors
effect_size = zeros( 750, 1 ) ;
effect_size_smooth = zeros( 750, 1 ) ;
effect_size_distcorr = zeros( 750, 1 ) ;
effect_size_distcorr_smooth = zeros( 750, 1 ) ;

for kk = 1 : 750

% first, get a random pick from the price history and all its associated values
sample_index = randperm( (size(price,1)-55), 1 ) .+ 50 ;
lookback = period( sample_index ) ;
sample_to_match = channel_price( sample_index-lookback : sample_index )' ;
sample_to_match_smooth = smooth_channel_price( sample_index-lookback : sample_index )' ;
projection_to_match = ( ( price( (sample_index+1):(sample_index+5) ) .- min_price(sample_index) ) ./ ( max_price(sample_index)-min_price(sample_index) ) )' ;
projection_to_match_smooth = ( ( price( (sample_index+1):(sample_index+5) ) .- min_smooth_price(sample_index) ) ./ ( max_smooth_price(sample_index)-min_smooth_price(sample_index) ) )' ;

% for this pick, calculate cauchy_schwarz_values
for ii = 50 : size( price, 1 )
cauchy_schwarz_values(ii) = abs( sample_to_match * channel_price( ii-lookback : ii ) ) / ( norm(sample_to_match) * norm( channel_price( ii-lookback : ii , 1 ) ) ) ;
cauchy_schwarz_values_smooth(ii) = abs( sample_to_match_smooth * smooth_channel_price( ii-lookback : ii ) ) / ( norm(sample_to_match_smooth) * norm( smooth_channel_price( ii-lookback : ii , 1 ) ) ) ;
end

% now set the values for sample_to_match +/- 2 to zero to avoid matching with itself
cauchy_schwarz_values( sample_index-2 : sample_index+2 ) = 0.0 ;
cauchy_schwarz_values_smooth( sample_index-2 : sample_index+2 ) = 0.0 ;

% set the last six values to zero to allow for projections
cauchy_schwarz_values( end-5 : end ) = 0.0 ;
cauchy_schwarz_values_smooth( end-5 : end ) = 0.0 ;

% get the top N matches
for ii = 1 : N

[ max_val, ix ] = max( cauchy_schwarz_values ) ;
norm_price_proj_match = ( ( price( ((ix)+1):((ix)+5) ) .- min_price(ix) ) ./ ( max_price(ix)-min_price(ix) ) ) ;
matches_values(ii) = abs( projection_to_match * norm_price_proj_match ) / ( norm(projection_to_match) * norm( norm_price_proj_match ) ) ;
cauchy_schwarz_values( ix-2 : ix+2 ) = 0.0 ;

[ max_val, ix ] = max( cauchy_schwarz_values_smooth ) ;
norm_price_smooth_proj_match = ( ( price( ((ix)+1):((ix)+5) ) .- min_smooth_price(ix) ) ./ ( max_smooth_price(ix)-min_smooth_price(ix) ) ) ;
matches_smooth_values(ii) = abs( projection_to_match_smooth * norm_price_smooth_proj_match ) / ( norm(projection_to_match_smooth) * norm( norm_price_smooth_proj_match ) ) ;
cauchy_schwarz_values_smooth( ix-2 : ix+2 ) = 0.0 ;

distcorr_values(ii) = distcorr( projection_to_match', norm_price_proj_match ) ;
distcorr_values_smooth(ii) = distcorr( projection_to_match_smooth', norm_price_smooth_proj_match ) ;

end % end of top N matches loop

% get and record averages for the top N matches
matches_values_average = mean( matches_values ) ;
matches_smooth_values_average = mean( matches_smooth_values ) ;
distcorr_average = mean( distcorr_values ) ;
distcorr_smooth_average = mean( distcorr_values_smooth ) ;

% now create a null distribution of random price projections
% randomly choosen from prices

for jj = 1 : 750

random_index = randperm( (size(price,1)-55), 10 ) .+ 50 ;
for ii = 1 : 10

norm_price_proj_match = ( ( price( (random_index(ii)+1):(random_index(ii)+5) ) .- min_price(random_index(ii)) ) ./ ( max_price(random_index(ii))-min_price(random_index(ii)) ) ) ;
matches_values(ii) = abs( projection_to_match * norm_price_proj_match ) / ( norm(projection_to_match) * norm( norm_price_proj_match ) ) ;

norm_price_smooth_proj_match = ( ( price( (random_index(ii)+1):(random_index(ii)+5) ) .- min_smooth_price(random_index(ii)) ) ./ ( max_smooth_price(random_index(ii))-min_smooth_price(random_index(ii)) ) ) ;
matches_smooth_values(ii) = abs( projection_to_match_smooth * norm_price_smooth_proj_match ) / ( norm(projection_to_match_smooth) * norm( norm_price_smooth_proj_match ) ) ;

distcorr_values(ii) = distcorr( projection_to_match', norm_price_proj_match ) ;
distcorr_values_smooth(ii) = distcorr( projection_to_match_smooth', norm_price_smooth_proj_match ) ;

end % end of random index ii loop

random_matches_values_averages(jj) = mean( matches_values ) ;
random_matches_smooth_values_averages(jj) = mean( matches_smooth_values ) ;
random_distcorr_averages(jj) = mean( distcorr_values ) ;
random_distcorr_smooth_averages(jj) = mean( distcorr_values_smooth ) ;

end % end jj loop

effect_size(kk) = ( matches_values_average - mean( random_matches_values_averages ) ) / std( random_matches_values_averages ) ;
effect_size_smooth(kk) = ( matches_smooth_values_average - mean( random_matches_smooth_values_averages ) ) / std( random_matches_smooth_values_averages ) ;
effect_size_distcorr(kk) = ( distcorr_average - mean( random_distcorr_averages ) ) / std( random_distcorr_averages ) ;
effect_size_distcorr_smooth(kk) = ( distcorr_smooth_average - mean( random_distcorr_smooth_averages ) ) / std( random_distcorr_smooth_averages ) ;

end % end kk loop

all_effect_sizes = [ effect_size, effect_size_smooth, effect_size_distcorr, effect_size_distcorr_smooth ] ;
dlmwrite( 'all_effect_sizes', all_effect_sizes )

Results
Running the code on the EURUSD forex pair and plotting histograms gives this:

where figures 1 and 2 are for the Cauchy-Schwarz values and figures 3 and 4 are Distance correlation values for comparative purposes, and which I won't discuss in this post.

On seeing this for the first time I was somewhat surprised as I had expected the effect size distribution(s) to be approximately normal because all the test calculations are based on averages. However, it was a pleasant surprise due to the peak in values at the right hand side, showing a possible substantial effect size. To make things clearer here are the percentiles of the four histograms above:

   0.00000  -5.08931  -4.79836  -3.05912  -3.65668
   0.01000  -3.61724  -3.20229  -2.46932  -2.45201
   0.02000  -3.39841  -2.81969  -2.21764  -2.20515
   0.03000  -3.00404  -2.49009  -1.89562  -2.05380
   0.04000  -2.66393  -2.35174  -1.80412  -1.91032
   0.05000  -2.52514  -2.03670  -1.68800  -1.71335
   0.06000  -2.22298  -1.91877  -1.59624  -1.61089
   0.07000  -2.07188  -1.88256  -1.52058  -1.48763
   0.08000  -1.93247  -1.79727  -1.45786  -1.42828
   0.09000  -1.71065  -1.66522  -1.36500  -1.35917
   0.10000  -1.59803  -1.58943  -1.31570  -1.31809
   0.11000  -1.44325  -1.53087  -1.24996  -1.28199
   0.12000  -1.38234  -1.44477  -1.20741  -1.21903
   0.13000  -1.22440  -1.32961  -1.17397  -1.17619
   0.14000  -1.14728  -1.29863  -1.12755  -1.10768
   0.15000  -1.05431  -1.19564  -1.09108  -1.08591
   0.16000  -0.93505  -1.10204  -1.06018  -1.04149
   0.17000  -0.88272  -1.05314  -1.00478  -1.00248
   0.18000  -0.79723  -1.01394  -0.96389  -0.97786
   0.19000  -0.66914  -0.98012  -0.92679  -0.96108
   0.20000  -0.58700  -0.88085  -0.89990  -0.91932
   0.21000  -0.52548  -0.84929  -0.86971  -0.87901
   0.22000  -0.44446  -0.82412  -0.83585  -0.84796
   0.23000  -0.40282  -0.76732  -0.80526  -0.82919
   0.24000  -0.36407  -0.68691  -0.75698  -0.80794
   0.25000  -0.32960  -0.65915  -0.73488  -0.77562
   0.26000  -0.21295  -0.61977  -0.64435  -0.73739
   0.27000  -0.13202  -0.57937  -0.60995  -0.70502
   0.28000  -0.07516  -0.50076  -0.54194  -0.67219
   0.29000  -0.00845  -0.43592  -0.51490  -0.61872
   0.30000   0.04592  -0.35829  -0.49879  -0.59214
   0.31000   0.08091  -0.29488  -0.47284  -0.56236
   0.32000   0.11649  -0.24116  -0.44727  -0.52599
   0.33000   0.20059  -0.20343  -0.38769  -0.48137
   0.34000   0.29594  -0.17594  -0.32956  -0.46426
   0.35000   0.33832  -0.12867  -0.31033  -0.44284
   0.36000   0.38473  -0.10445  -0.28196  -0.41119
   0.37000   0.42759  -0.07363  -0.25178  -0.37141
   0.38000   0.45809  -0.03128  -0.21921  -0.33732
   0.39000   0.51545   0.00103  -0.19434  -0.30017
   0.40000   0.56191   0.05818  -0.16896  -0.26556
   0.41000   0.60728   0.09308  -0.15057  -0.23521
   0.42000   0.63342   0.13244  -0.13961  -0.21845
   0.43000   0.67951   0.17094  -0.11061  -0.20428
   0.44000   0.69882   0.22192  -0.05734  -0.19437
   0.45000   0.75193   0.25773  -0.03497  -0.16183
   0.46000   0.79911   0.30891  -0.00695  -0.13580
   0.47000   0.84183   0.35623   0.01927  -0.11969
   0.48000   0.91024   0.38352   0.05030  -0.10521
   0.49000   0.94791   0.42460   0.06230  -0.07570
   0.50000   1.01034   0.48288   0.08379  -0.05241
   0.51000   1.04269   0.54956   0.11360  -0.03448
   0.52000   1.07527   0.62407   0.13003  -0.00864
   0.53000   1.10908   0.65434   0.16910   0.01793
   0.54000   1.12665   0.69819   0.19257   0.03546
   0.55000   1.13850   0.75071   0.20893   0.05331
   0.56000   1.17187   0.78859   0.24099   0.08191
   0.57000   1.19397   0.82243   0.25359   0.10432
   0.58000   1.22162   0.87152   0.26988   0.13012
   0.59000   1.24032   0.91341   0.29813   0.16376
   0.60000   1.26567   0.96977   0.32279   0.20620
   0.61000   1.29286   1.00221   0.36456   0.23991
   0.62000   1.32750   1.03669   0.37966   0.28647
   0.63000   1.35170   1.07326   0.43526   0.31652
   0.64000   1.38017   1.12882   0.45922   0.35653
   0.65000   1.39101   1.15719   0.47552   0.37813
   0.66000   1.41716   1.17241   0.49585   0.41064
   0.67000   1.44582   1.21725   0.50760   0.42996
   0.68000   1.46310   1.26081   0.56082   0.44876
   0.69000   1.47664   1.27710   0.58793   0.49889
   0.70000   1.49066   1.31164   0.60148   0.54122
   0.71000   1.49891   1.34165   0.64747   0.57689
   0.72000   1.50470   1.36688   0.67315   0.59469
   0.73000   1.51436   1.38746   0.70662   0.63938
   0.74000   1.52604   1.41351   0.75330   0.66263
   0.75000   1.54430   1.43842   0.78925   0.67884
   0.76000   1.55633   1.46536   0.81250   0.69540
   0.77000   1.56282   1.48012   0.84801   0.72899
   0.78000   1.57245   1.49574   0.86657   0.73934
   0.79000   1.58277   1.51564   0.90696   0.76147
   0.80000   1.59149   1.53226   0.93265   0.81038
   0.81000   1.59883   1.54450   0.97456   0.85287
   0.82000   1.60587   1.55777   1.00809   0.90534
   0.83000   1.61216   1.56334   1.02570   0.96566
   0.84000   1.61803   1.57583   1.05052   1.02102
   0.85000   1.62568   1.58589   1.07218   1.03485
   0.86000   1.63091   1.59593   1.11747   1.09383
   0.87000   1.64307   1.60745   1.14659   1.16075
   0.88000   1.65033   1.61638   1.17268   1.21484
   0.89000   1.65691   1.62442   1.21196   1.24922
   0.90000   1.66307   1.63321   1.25644   1.30013
   0.91000   1.67429   1.64781   1.30644   1.33641
   0.92000   1.68702   1.66001   1.34919   1.37382
   0.93000   1.69829   1.67226   1.39081   1.41904
   0.94000   1.70893   1.68142   1.47874   1.48799
   0.95000   1.72625   1.70083   1.62107   1.58719
   0.96000   1.73656   1.71328   1.82299   1.63232
   0.97000   1.77279   1.74188   1.99231   1.72630
   0.98000   1.89750   1.79882   2.19662   1.94227
   0.99000   2.34395   2.06873   2.34937   2.24499
   1.00000   3.73384   4.27923   4.11659   2.74557

where the first column contains the percentiles, and the 2nd, 3rd, 4th and 5th columns correspond to figures 1, 2, 3 and 4 above, and contain the effect size values. Looking at the 1st column it can be seen that if Cohen's "scale" is applied, over 50% of the effect size values can be describe as "large," with an approximate further 15% being "medium" effect.

All in all a successful test, which encourages me to adopt the Cauchy-Schwarz inequality, but before I do there are one or two more tweaks I would like to test. This will be the subject of my next post.

Sunday 6 April 2014

The Cauchy-Schwarz Inequality

In my previous post I said I was looking into my code for the dominant cycle, mostly with a view to either improving my code or perhaps replacing/augmenting it with some other method of calculating the cycle period. To this end I have recently enrolled on a discrete time signals and systems course offered by edx. One of the lectures was about the Cauchy-Schwarz inequality, which is what this post is about.

The basic use I have in mind is to use the inequality to select sections of price history that are most similar to one another and use these as training cases for neural net training. My initial Octave code is given in the code box below:-

clear all

% load price file of interest
filename = 'eurusdmatrix' ; %input( 'Enter filename for prices, e.g. es or esmatrix: ' , 's' ) ;
data = load( "-ascii" , filename ) ;

% get tick size
switch filename

case { "cc" }
tick = 1 ;

case { "gc" "lb" "pl" "sm" "sp" }
tick = 0.1 ;

case { "ausyen" "bo" "cl" "ct" "dx" "euryen" "gbpyen" "sb" "usdyen" }
tick = 0.01 ;

case { "c" "ng" }
tick = 0.001 ;

case { "auscad" "aususd" "euraus" "eurcad" "eurchf" "eurgbp" "eurusd" "gbpchf" "gbpusd" "ho" "rb" "usdcad" "usdchf" }
tick = 0.0001 ;

case { "c" "o" "s" "es" "nd" "w" }
tick = 0.25 ;

case { "fc" "lc" "lh" "pb" }
tick = 0.025 ;

case { "ed" }
tick = 0.0025 ;

case { "si" }
tick = 0.5 ;

case { "hg" "kc" "oj" "pa" }
tick = 0.05 ;

case { "ty" "us" }
tick = 0.015625 ;

case { "ccmatrix" }
tick = 1 ;

case { "gcmatrix" "lbmatrix" "plmatrix" "smmatrix" "spmatrix" }
tick = 0.1 ;

case { "ausyenmatrix" "bomatrix" "clmatrix" "ctmatrix" "dxmatrix" "euryenmatrix" "gbpyenmatrix" "sbmatrix" "usdyenmatrix" }
tick = 0.01 ;

case { "cmatrix" "ngmatrix" }
tick = 0.001 ;

case { "auscadmatrix" "aususdmatrix" "eurausmatrix" "eurcadmatrix" "eurchfmatrix" "eurgbpmatrix" "eurusdmatrix" "gbpchfmatrix" "gbpusdmatrix" "homatrix" "rbmatrix" "usdcadmatrix" "usdchfmatrix" }
tick = 0.0001 ;

case { "cmatrix" "omatrix" "smatrix" "esmatrix" "ndmatrix" "wmatrix" }
tick = 0.25 ;

case { "fcmatrix" "lcmatrix" "lhmatrix" "pbmatrix" }
tick = 0.025 ;

case { "edmatrix" }
tick = 0.0025 ;

case { "simatrix" }
tick = 0.5 ;

case { "hgmatrix" "kcmatrix" "ojmatrix" "pamatrix" }
tick = 0.05 ;

case { "tymatrix" "usmatrix" }
tick = 0.015625 ;

endswitch

open = data( : , 4 ) ;
high = data( : , 5 ) ;
low = data( : , 6 ) ;
close = data( : , 7 ) ;
period = data( : , 12 ) ;
price = vwap( open, high, low, close, tick ) ;
[ max_adj, min_adj, channel_price ] = adaptive_lookback_max_min( price, period, tick ) ;
smooth_price = smooth_2_5( price ) ;
[ max_adj, min_adj, smooth_channel_price ] = adaptive_lookback_max_min( smooth_price, period, tick ) ;

clear -exclusive channel_price smooth_channel_price period price

% randomly choose vwap prices to match
sample_index = randperm( size(channel_price,1), 1 )
lookback = period( sample_index )
sample_to_match = channel_price( sample_index-lookback : sample_index )' ;
sample_to_match_smooth = smooth_channel_price( sample_index-lookback : sample_index )' ;

cauchy_schwarz_values = zeros( size(channel_price,1) , 1 ) ;
cauchy_schwarz_values_smooth = zeros( size(channel_price,1) , 1 ) ;

for ii = 50 : size(channel_price,1)

% match_size = size( channel_price( ii-lookback : ii ) )

cauchy_schwarz_values(ii) = abs( sample_to_match * channel_price( ii-lookback : ii ) ) / ( norm(sample_to_match) * norm( channel_price( ii-lookback : ii , 1 ) ) ) ;
cauchy_schwarz_values_smooth(ii) = abs( sample_to_match_smooth * smooth_channel_price( ii-lookback : ii ) ) / ( norm(sample_to_match_smooth) * norm( smooth_channel_price( ii-lookback : ii , 1 ) ) ) ;

end

% now set the values for sample_to_match +/- 2 to zero to avoid matching with itself
cauchy_schwarz_values( sample_index-2 : sample_index+2 ) = 0.0 ;
cauchy_schwarz_values_smooth( sample_index-2 : sample_index+2 ) = 0.0 ;

N = 10 ; % must be >= 10

% get index values of the top N matches
matches = zeros( N, 1 ) ;
matches_smooth = zeros( N, 1 ) ;

% record these values
matches_values = zeros( N, 1 ) ;
matches_smooth_values = zeros( N, 1 ) ;

for ii = 1: N

[ max_val, ix ] = max( cauchy_schwarz_values ) ;
matches(ii) = ix ;
matches_values(ii) = cauchy_schwarz_values(ix) ;
cauchy_schwarz_values( ix-2 : ix+2 ) = 0.0 ;

[ max_val, ix ] = max( cauchy_schwarz_values_smooth ) ;
matches_smooth(ii) = ix ;
matches_smooth_values(ii) = cauchy_schwarz_values_smooth(ix) ;
cauchy_schwarz_values_smooth( ix-2 : ix+2 ) = 0.0 ;

end

% Plot for visual inspection
clf ;

% the matched index values
figure(1) ;
subplot(2,1,1) ; plot( cauchy_schwarz_values, 'c' ) ;
subplot(2,1,2) ; plot( cauchy_schwarz_values_smooth, 'c' ) ;
set( gcf() ,  'color' , [0 0 0] )

% the top N matched price sequences
figure(2) ;
subplot(5, 2, 1) ; plot( sample_to_match, 'c', channel_price( matches(1)-lookback : matches(1) ), 'r', channel_price( matches_smooth(1)-lookback : matches_smooth(1) ), 'y' ) ;
subplot(5, 2, 2) ; plot( sample_to_match, 'c', channel_price( matches(2)-lookback : matches(2) ), 'r', channel_price( matches_smooth(2)-lookback : matches_smooth(2) ), 'y' ) ;
subplot(5, 2, 3) ; plot( sample_to_match, 'c', channel_price( matches(3)-lookback : matches(3) ), 'r', channel_price( matches_smooth(3)-lookback : matches_smooth(3) ), 'y' ) ;
subplot(5, 2, 4) ; plot( sample_to_match, 'c', channel_price( matches(4)-lookback : matches(4) ), 'r', channel_price( matches_smooth(4)-lookback : matches_smooth(4) ), 'y' ) ;
subplot(5, 2, 5) ; plot( sample_to_match, 'c', channel_price( matches(5)-lookback : matches(5) ), 'r', channel_price( matches_smooth(5)-lookback : matches_smooth(5) ), 'y' ) ;
subplot(5, 2, 6) ; plot( sample_to_match, 'c', channel_price( matches(6)-lookback : matches(6) ), 'r', channel_price( matches_smooth(6)-lookback : matches_smooth(6) ), 'y' ) ;
subplot(5, 2, 7) ; plot( sample_to_match, 'c', channel_price( matches(7)-lookback : matches(7) ), 'r', channel_price( matches_smooth(7)-lookback : matches_smooth(7) ), 'y' ) ;
subplot(5, 2, 8) ; plot( sample_to_match, 'c', channel_price( matches(8)-lookback : matches(8) ), 'r', channel_price( matches_smooth(8)-lookback : matches_smooth(8) ), 'y' ) ;
subplot(5, 2, 9) ; plot( sample_to_match, 'c', channel_price( matches(9)-lookback : matches(9) ), 'r', channel_price( matches_smooth(9)-lookback : matches_smooth(9) ), 'y' ) ;
subplot(5, 2, 10) ; plot( sample_to_match, 'c', channel_price( matches(10)-lookback : matches(10) ), 'r', channel_price( matches_smooth(10)-lookback : matches_smooth(10) ), 'y' ) ;
set( gcf() ,  'color' , [0 0 0] )

figure(3)
subplot(5, 2, 1) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(1)-lookback : matches(1) ) ) ;
subplot(5, 2, 2) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(2)-lookback : matches(2) ) ) ;
subplot(5, 2, 3) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(3)-lookback : matches(3) ) ) ;
subplot(5, 2, 4) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(4)-lookback : matches(4) ) ) ;
subplot(5, 2, 5) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(5)-lookback : matches(5) ) ) ;
subplot(5, 2, 6) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(6)-lookback : matches(6) ) ) ;
subplot(5, 2, 7) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(7)-lookback : matches(7) ) ) ;
subplot(5, 2, 8) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(8)-lookback : matches(8) ) ) ;
subplot(5, 2, 9) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(9)-lookback : matches(9) ) ) ;
subplot(5, 2, 10) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches(10)-lookback : matches(10) ) ) ;

figure(4)
subplot(5, 2, 1) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(1)-lookback : matches_smooth(1) ) ) ;
subplot(5, 2, 2) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(2)-lookback : matches_smooth(2) ) ) ;
subplot(5, 2, 3) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(3)-lookback : matches_smooth(3) ) ) ;
subplot(5, 2, 4) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(4)-lookback : matches_smooth(4) ) ) ;
subplot(5, 2, 5) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(5)-lookback : matches_smooth(5) ) ) ;
subplot(5, 2, 6) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(6)-lookback : matches_smooth(6) ) ) ;
subplot(5, 2, 7) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(7)-lookback : matches_smooth(7) ) ) ;
subplot(5, 2, 8) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(8)-lookback : matches_smooth(8) ) ) ;
subplot(5, 2, 9) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(9)-lookback : matches_smooth(9) ) ) ;
subplot(5, 2, 10) ; plotyy( (1:1:length(price( sample_index-lookback : sample_index ))) , price( sample_index-lookback : sample_index ) , (1:1:length(price( sample_index-lookback : sample_index ))) , price( matches_smooth(10)-lookback : matches_smooth(10) ) ) ;

% Print results to terminal
results = zeros( N , 2 ) ;

for ii = 1 : N
results(ii,1) = distcorr( sample_to_match', channel_price( matches(ii)-lookback : matches(ii) ) ) ;
results(ii,2) = distcorr( sample_to_match', channel_price( matches_smooth(ii)-lookback : matches_smooth(ii) ) ) ;
end

results = [ matches_values matches_smooth_values results ]

After some basic "housekeeping" code to load the price file of interest and normalise the prices, a random section of the price history is selected and then, in a loop, the top N matches in the history are found using the inequality as the metric for matching. A value of 0 means that the price series being compared are orthogonal, and hence as dissimilar to each other as possible, whilst a value of 1 means the opposite. There are two types of matching; the raw price matched with raw price, and a smoothed price matched with smoothed price.

First off, although the above code randomly selects a section of price history to match, I deliberately hand chose a section to match for illustrative purposes in this post. Below is the section

where the section ends at the point where the vertical cursor crosses the price and begins at the high just below the horizontal cursor, for a look back period of 16 bars. For context, here is a zoomed out view.

I chose this section because it represents a "difficult" set of prices, i.e. moving sideways at the end of a retracement and perhaps reacting to a previous low acting as resistance, as well as being in a Fibonacci retracement zone.

The first set of code outputs is this chart

which shows the Cauchy-Schwarz values for the whole range of the price series, with the upper pane being values for the raw price matching and the lower pane being the smoothed price matching. Note that in the code the values are set to zero after the max function has selected the best match and so the spikes down to zero show the points in time where the top N, in this case 10, matches were taken from.

The next chart output shows the the normalised prices that the matching is done against, with the cyan being the original sample (the same in all subplots), the red being the raw price matches and the yellow being the smoothed price matches.

The closest match is the top left subplot, and then reading horizontally and down to the 10th best in the bottom right subplot.

The next plot shows the price matches un-normalised, for the raw price matching, with the original sample being blue,

and next for the smoothed matching,

and finally, side by side for easy visual comparison.

N.b. For all the smoothed plots above, although the matching is done on smoothed prices, the unsmoothed, raw prices for these matches are plotted.

After plotting all the above, the code prints to terminal some details thus:

lookback = 16
results =

   0.95859   0.98856   0.89367   0.86361
   0.95733   0.98753   0.93175   0.86839
   0.95589   0.98697   0.87398   0.67945
   0.95533   0.98538   0.85346   0.83079
   0.95428   0.98293   0.91212   0.77225
   0.94390   0.98292   0.79350   0.66563
   0.93908   0.98150   0.71753   0.77458
   0.93894   0.97992   0.86839   0.72492
   0.93345   0.97969   0.74456   0.79060
   0.93286   0.97940   0.86361   0.61103

which, column wise, are the Cauchy-Schwarz values for the raw price matching and the smoothed price matching, and the Distance correlation values for the raw price matching and the smoothed price matching respectively.

The code used to calculate the Distance correlation is given below.

% Copyright (c) 2013, Shen Liu
% All rights reserved.

% Redistribution and use in source and binary forms, with or without
% modification, are permitted provided that the following conditions are
% met:

%    * Redistributions of source code must retain the above copyright
%      notice, this list of conditions and the following disclaimer.
%    * Redistributions in binary form must reproduce the above copyright
%      notice, this list of conditions and the following disclaimer in
%      the documentation and/or other materials provided with the distribution

% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
% AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
% IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
% ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
% LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
% CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
% SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
% INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
% CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
% ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
% POSSIBILITY OF SUCH DAMAGE.

function dcor = distcorr(x,y)

% This function calculates the distance correlation between x and y.
% Reference: http://en.wikipedia.org/wiki/Distance_correlation 
% Date: 18 Jan, 2013
% Author: Shen Liu (shen.liu@hotmail.com.au)

% Check if the sizes of the inputs match
if size(x,1) ~= size(y,1) ;
    error('Inputs must have the same number of rows')
end

% Delete rows containing unobserved values
N = any([isnan(x) isnan(y)],2) ;
x(N,:) = [] ;
y(N,:) = [] ;

% Calculate doubly centered distance matrices for x and y
a = pdist([x,x]) ; % original MATLAB call is to pdist2( x, x )
mcol = mean(a) ;
mrow = mean(a,2) ;
ajbar = ones(size(mrow))*mcol ;
akbar = mrow*ones(size(mcol)) ;
abar = mean(mean(a))*ones(size(a)) ;
A = a - ajbar - akbar + abar ;

b = pdist([y,y]) ;
mcol = mean(b) ;
mrow = mean(b,2) ;
bjbar = ones(size(mrow))*mcol ;
bkbar = mrow*ones(size(mcol)) ;
bbar = mean(mean(b))*ones(size(b)) ;
B = b - bjbar - bkbar + bbar ;

% Calculate squared sample distance covariance and variances
dcov = sum(sum(A.*B))/(size(mrow,1)^2) ;

dvarx = sum(sum(A.*A))/(size(mrow,1)^2) ;
dvary = sum(sum(B.*B))/(size(mrow,1)^2) ;

% Calculate the distance correlation
dcor = sqrt(dcov/sqrt(dvarx*dvary)) ;

These results show promise, and I intend to apply a more rigorous test to them for the subject of a future post.

Pages

Wednesday 16 April 2014

Effect Size of Cauchy-Schwarz Matching Algorithm

Sunday 6 April 2014

The Cauchy-Schwarz Inequality