tag:blogger.com,1999:blog-55600529437724193202024-03-18T20:45:59.484+01:00Dekalog Blog"Trading is statistics and time series analysis." This blog details my progress in developing a systematic trading system for use on the futures and forex markets, with discussion of the various indicators and other inputs used in the creation of the system. Also discussed are some of the issues/problems encountered during this development process. Within the blog posts there are links to other web pages that are/have been useful to me.Unknownnoreply@blogger.comBlogger237125tag:blogger.com,1999:blog-5560052943772419320.post-38081653010493890802024-02-28T12:08:00.000+01:002024-02-28T12:08:31.136+01:00Indicator(s) Derived from PositionBook Data<p>Since my <a href="https://dekalogblog.blogspot.com/2023/12/judging-quality-of-indicators.html" target="_blank">last post</a> I have been trying to create new indicators from PositionBook data but unfortunately I have had no luck in doing so. I have have tried differences, ratios, cumulative sums, logs and <a href="https://en.wikipedia.org/wiki/Control_chart" target="_blank">control charts</a> to no avail and I have decided to discontinue this line of investigation because it doesn't seem to hold much promise. The only other direct uses I can think of for this data are:</p><ul style="text-align: left;"><li>modifying existing indicators that use volume such as <a href="https://en.wikipedia.org/wiki/Accumulation/distribution_index" target="_blank">Accumulation/distribution index</a>, <a href="https://en.wikipedia.org/wiki/On-balance_volume" target="_blank">On balance volume</a> and <a href="https://en.wikipedia.org/wiki/Money_flow_index" target="_blank">Money flow index</a>. The idea would be to use the PositionBook data relationships instead of the price bar relationships to calculate the intermediate steps of CLV and Money Ratio.</li><li>create a sort of PositionBook profile chart, similar to a <a href="https://dekalogblog.blogspot.com/2021/08/another-iterative-improvement-of-my.html" target="_blank">volume profile</a> chart. However, I suspect that this would be redundant as the high/low PositionBook nodes would be almost identical to the high/low volume nodes.</li><li>directly use the PositionBook data as input to a <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> model, either as a stand alone "indicator" or in a <a href="https://en.wikipedia.org/wiki/Meta-learning_(computer_science)" target="_blank">Meta learning</a> paradigm as outlined in <a href="https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089" target="_blank">Advances in Financial Machine Learning</a>.</li></ul>I am not yet sure which of the above I will look at next, but whichever it is will be the subject of a future post. <br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-59824368856824780712023-12-21T14:44:00.000+01:002023-12-21T14:44:29.624+01:00Judging the Quality of Indicators.<p>In my <a href="https://dekalogblog.blogspot.com/2023/11/update-to-positionbook-chart-revised.html" target="_blank">previous post</a> I said I was trying to develop new indicators from the results of my new PositionBook optimisation routine. In doing so, I need to have a methodology for judging the quality of the indicator(s). In the past I created a <a href="https://github.com/Dekalog/Data-Snooping-Tests" target="_blank">Data-Snooping-Tests-GitHub</a> which contains some tests for statistical significance testing and which, of course, can be used on these new indicators. Additionally, for many years I have had a link to <a href="http://www.timothymasters.info/tssb.html" target="_blank">tssb</a> on this blog from where a free testing program, VarScreen, and its associated manual are available. Timothy Masters also has a book, <a href="https://www.amazon.com/Testing-Tuning-Market-Trading-Systems/dp/148424172X" target="_blank">Testing and Tuning Market Trading Systems</a>, wherein there is C++ code for an Entropy test, an <a href="https://octave.org/" target="_blank">Octave</a> compiled .oct version of which is shown in the following code box.</p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>#include "octave oct.h"
#include "octave dcolvector.h"
#include "cmath"
#include "algorithm"
DEFUN_DLD ( entropy, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {entropy_value =} entropy (@var{input_vector,nbins})\n\
This function takes an input vector and nbins and calculates\n\
the entropy of the input_vector. This input_vector will usually\n\
be an indicator for which we want the entropy value. This value ranges\n\
from 0 to 1 and a minimum value of 0.5 is recommended. Less than 0.1 is\n\
serious and should be addressed. If nbins is not supplied, a default value\n\
of 20 is used. If the input_vector length is < 50, an error will be thrown.\n\
@end deftypefn" )
{
octave_value_list retval_list ;
int nargin = args.length () ;
int nbins , k ;
double entropy , factor , p , sum ;
// check the input arguments
if ( args(0).length () < 50 )
{
error ("Invalid 1st argument length. Input is a vector of length >= 50.") ;
return retval_list ;
}
if ( nargin == 1 )
{
nbins = 20 ;
}
if ( nargin == 2 )
{
nbins = args(1).int_value() ;
}
// end of input checking
ColumnVector input = args(0).column_vector_value () ;
ColumnVector count( nbins ) ;
double max_val = *std::max_element( &input(0), &input( args(0).length () - 1 ) ) ;
double min_val = *std::min_element( &input(0), &input( args(0).length () - 1 ) ) ;
factor = ( nbins - 1.e-10 ) / ( max_val - min_val + 1.e-60 ) ;
for ( octave_idx_type ii ( 0 ) ; ii < args(0).length () ; ii++ ) {
k = ( int ) ( factor * ( input( ii ) - min_val ) ) ;
++count( k ) ; }
sum = 0.0 ;
for ( octave_idx_type ii ( 0 ) ; ii < nbins ; ii++ ) {
if ( count( ii ) ) {
p = ( double ) count( ii ) / args(0).length () ;
sum += p * log ( p ) ; }
}
entropy = -sum / log ( (double) nbins ) ;
retval_list( 0 ) = entropy ;
return retval_list ;
} // end of function</code></pre><p></p>This calculates the information content, <a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)" target="_blank">Entropy_(information_theory)</a>, of any indicator, the value for which ranges from 0 to 1, with a value of 1 being ideal. Masters suggests that a minimum value of 0.5 is acceptable for indicators and also suggests ways in which the calculation of any indicator can be adjusted to improve its entropy value. By way of example, below is a plot of an "ideal" (blue) indicator, which has values uniformly spread across its range<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDzzA-iP0JALAlBOPQct0K8QGKPdVG7qbNUiQvcejJMvB9kvZiDnW9eYglXmmOsA5n056oxkgZL6S5eUYFgAlyWcjMFUDVX3dFZU90QjZos2t8jn4tfxe6LobHIewMJyv5_Gi9Cc-zajDuY1dMxQ54uAuEV-IkYSgUO4AHwv3IjF38IBo93qJgWT4Nnj8S/s1531/ideal.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="830" data-original-width="1531" height="173" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDzzA-iP0JALAlBOPQct0K8QGKPdVG7qbNUiQvcejJMvB9kvZiDnW9eYglXmmOsA5n056oxkgZL6S5eUYFgAlyWcjMFUDVX3dFZU90QjZos2t8jn4tfxe6LobHIewMJyv5_Gi9Cc-zajDuY1dMxQ54uAuEV-IkYSgUO4AHwv3IjF38IBo93qJgWT4Nnj8S/s320/ideal.png" width="320" /></a></div>with an entropy value of 0.9998. This second plot shows a "good" indicator, which has an <br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKyVMgFK5uXDdWABMOLPU3nbCKNoCVdxbhdxrAVCXoPtAb20M2Lnhzkbiu26XznCwMg1kMf1atCcCgLHkgUvYOnF0hPSZgZyMSWFGhOibGn7N_JuMuATkG1c7dljUv6FHl1-bX71cx6sDCh8OnWDbMu0FRm-v1XQTmn8ouXO8L4phCLxEimI4aZgONSeVc/s1531/good.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="830" data-original-width="1531" height="173" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKyVMgFK5uXDdWABMOLPU3nbCKNoCVdxbhdxrAVCXoPtAb20M2Lnhzkbiu26XznCwMg1kMf1atCcCgLHkgUvYOnF0hPSZgZyMSWFGhOibGn7N_JuMuATkG1c7dljUv6FHl1-bX71cx6sDCh8OnWDbMu0FRm-v1XQTmn8ouXO8L4phCLxEimI4aZgONSeVc/s320/good.png" width="320" /></a></div><p>entropy value of 0.7781 and is in fact just random, normally distributed values with a mean of 0 and variance 1. In both plots, the red indicators fail to meet the recommended minimum value, both having entropy values of 0.2314.</p><p>It is visually intuitive that in both plots the blue indicators convey more information than the red ones. In creating my new PositionBook indicators I intend to construct them in such a way as to maximise their entropy before I progress to some of the above mentioned tests. <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-11562353239613415622023-11-22T14:46:00.001+01:002023-12-21T13:24:48.078+01:00Update to PositionBook Chart - Revised Optimisation Method<p>Just over a year ago I previewed a new chart type which I called a "PositionBook Chart" and gave examples in <a href="https://dekalogblog.blogspot.com/2022/11/a-new-positionbook-chart-type.html" target="_blank">this post</a> and <a href="https://dekalogblog.blogspot.com/2022/11/positionbook-chart-example-trade.html" target="_blank">this one.</a> These first examples were based on an optimisation routine over 6 variables using <a href="https://octave.org/" target="_blank">Octave's</a> <a href="https://octave.sourceforge.io/octave/function/fminunc.html" target="_blank">fminunc</a> function, an unconstrained minimisation routine. However, I was not 100% convinced that the model I was using for the <a href="https://en.wikipedia.org/wiki/Mathematical_optimization" target="_blank">loss/cost function</a> was realistic, and so since the above posts I have been further testing different models to see if I could come up with a more satisfactory model and optimisation routine. The comparison between the original model and the better, newer model I have selected is indicated in the following animated GIF, which shows the last few day's action in the GBPUSD forex pair. <br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_RItedJgBZgDgLijXx6ty2ELR6Gx0jhhgb9xkchZlvdFoUas4jmuZTEtfxz_ZVdbip5MEGnHMbe3xEmruGQvyeMApgDXiwQi_m-Ykm8_fG_WHx1xV3JEzHlpTqcwcr6ZVCvCyRH3lLv_NHHa2RvENnGe7X3FBOAEPseQATnlOnzM53NNhNwfXhnfupzgR/s1920/anim.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_RItedJgBZgDgLijXx6ty2ELR6Gx0jhhgb9xkchZlvdFoUas4jmuZTEtfxz_ZVdbip5MEGnHMbe3xEmruGQvyeMApgDXiwQi_m-Ykm8_fG_WHx1xV3JEzHlpTqcwcr6ZVCvCyRH3lLv_NHHa2RvENnGe7X3FBOAEPseQATnlOnzM53NNhNwfXhnfupzgR/s320/anim.gif" width="320" /></a></div>The old model is figure(200), with the darker blue "blob" of positions accumulated at the lower, beginning of the chart, and the newer model, figure(900), shows accumulation throughout the uptrend. The reasons I prefer this newer model are:<p></p><ul style="text-align: left;"><li>4 of the 6 variables mentioned above (longs above and below price bar range, and shorts above and below price bar range) are theoretically linked to each other to preserve their mutual relationships and jointly minimised over a single input to the loss/cost function, which has a bounded upper and lower limit. This means I can use Octave's <a href="https://octave.sourceforge.io/octave/function/fminbnd.html" target="_blank">fminbnd</a> function instead of fminunc. The minimisation objective is the minimum absolute change in positions outside the price bar range, which has a real world relevance as compared to the <a href="https://en.wikipedia.org/wiki/Mean_squared_error" target="_blank">mean squared error</a> of the fminunc cost function.</li><li>because fminunc is "unconstrained" occasionally it would converge to unrealistic solutions with respect to position changes outside the price bar range. This does not happen with the new routine.</li><li>once the results of fminbnd are obtained, it is possible to mathematically calculate the position changes within the price bar range exactly, without needing to resort to any optimisation routine. This gives a zero error for the change which is arguably the most important.</li><li>the results from the new routine seem to be more stable in that indicators I am trying to create from them are noticeably less erratic and confusing than those created from fminunc results.</li><li>finally, fminbnd over 1 variable is much quicker to converge than fminunc over 6 variables. <br /></li></ul>The second last mentioned point, derived indicators, will be the subject of my next post. <br />Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-5560052943772419320.post-40624065002195035612023-08-20T17:13:00.004+02:002023-08-21T14:37:31.161+02:00Currency Strength Revisited<p>Recently I responded to a Quantitative Finance forum question <a href="https://quant.stackexchange.com/questions/76406/separating-forex-instruments" target="_blank">here</a>, where I invited the questioner to peruse certain posts on this blog. Apparently the posts do not provide enough information to fully answer the question (my bad) and therefore this post provides what I think will suffice as a full and complete reply, although perhaps not scientifically rigorous.</p><p>The original question asked was "Is it possible to separate or decouple the two currencies in a trading pair?" and I believe what I have previously described as a "currency strength indicator" does precisely this (blog search term ---> https://dekalogblog.blogspot.com/search?q=currency+strength+indicator). This post outlines the rationale behind my approach.</p><p>Take, for example, the GBPUSD forex pair, and further give it a current (imaginary) value of 1.2500. What does this mean? Of course it means 1 GBP will currently buy you 1.25 USD, or alternatively 1 USD will buy you 1/1.25 = 0.8 GBP. Now rather than write GBPUSD let's express GBPUSD as a ratio thus:- GBP/USD, which expresses the idea of "how many USD are there in a GBP?" in the same way that 9/3 shows how many 3s there are in 9. Now let's imagine at some time period later there is a new pair value, a lower case "gbp/usd" where we can write the relationship</p><p><span> </span><span> </span><span> </span><span> </span><span> </span>(1) ( GBP / USD ) * ( G / U ) = gbp / usd</p><p>to show the change over the time period in question. The ( G / U ) term is a multiplicative term to show the change in value from old GBP/USD 1.2500 to say new value gbp/usd of 1.2600, </p><p>e.g. <span> </span><span> </span><span> </span><span> </span>( G / U ) == ( gbp / usd ) / ( GBP / USD ) == 1.26 / 1.25 == 1.008</p><p>from which it is clear that the forex pair has increased by 0.8% in value over this time period. Now, if we imagine that over this time period the underlying, real value of USD has remained unchanged this is equivalent to setting the value U in ( G / U ) to exactly 1, thereby implying that the 0.8% increase in the forex pair value is entirely attributable to a 0.8% increase in the underlying, real value of GBP, i.e. G == 1.008. Alternatively, we can assume that the value of GBP remains unchanged,</p><p> e.g.<span> </span><span> </span><span> </span><span> </span>G == 1, which means that U == 1 / 1.008 == 0.9921</p><p>which implies that a ( 1 - 0.9921 ) == 0.79% <i>decrease</i> in USD value is responsible for the 0.8% <i>increase</i> in the pair quote.</p><p>Of course, given only equation (1) it is impossible to solve for G and U as either can be arbitrarily set to any number greater than zero and then be compensated for by setting the other number such that the constant ( G / U ) will match the required constant to account for the change in the pair value.</p><p>However, now let's introduce two other forex pairs (2) and (3) and thus we have:-</p><p><span> </span><span> </span><span> </span><span> </span><span> </span>(1) ( GBP / USD ) * ( G / U ) = gbp / usd <br /></p><p><span> </span><span> </span><span> </span><span> </span><span> </span>(2) ( EUR / USD ) * ( E / U ) = eur / usd</p><p><span> </span><span> </span><span> </span><span> </span><span> </span>(3) ( EUR / GBP ) * ( E / G ) = eur / gbp</p><p>We now have three equations and three unknowns, namely G, E and U, and so this system of equations could be laboriously, mathematically solved by substitution. </p><p>However, in my currency strength indicator I have taken a different approach. Instead of solving mathematically I have written an error function which takes as arguments a list of G, E, U, ... etc. for all currency multipliers relevant to all the forex quotes I have access to, approximately 47 various crosses which themselves are inputs to the error function, and this function is supplied to <a href="https://octave.sourceforge.io/octave/function/fminunc.html" target="_blank">Octave's fminunc function</a> to simultaneously solve for all G, E, U, ... etc. given all forex market quotes. The initial starting values for all G, E, U, ... etc. are 1, implying no change in values across the market. These starting values consistently converge to the same final values for G, E, U, ... etc for each separate period's optimisation iterations.</p><p>Having got all G, E, U, ... etc. what can be done? Well, taking G for example, we can write</p><p><span> </span><span> </span><span> </span><span> </span><span> (4) </span>GBP * G = gbp</p><p>for the underlying, real change in the value of GBP. Dividing each side of (4) by GBP and taking logs we get</p><p><span> </span><span> </span><span> </span><span> </span><span> </span>(5) log( G ) = log( gbp / GBP )</p><p>i.e. the log of the fminunc returned value for the multiplicative constant G is the equivalent of the log return of GBP independent of all other currencies, or as the original forum question asked, the (change in) value of GBP separated or decoupled the from the pair in which it is quoted.</p><p>Of course, having the individual log returns of separated or decoupled currencies, there are many things that can be done with them, such as:-</p><ul style="text-align: left;"><li>create indices for each currency</li><li>apply technical analysis to these separate indices</li><li>intermarket currency analysis</li><li>input to machine learning (ML) models</li><li>possibly create new and unique currency indicators</li></ul><p>Examples of the creation of "alternative price charts" and indices are shown below<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPL7YlSst12DOlAFS7iNoO8eYlecJZ1grb5pOwe3ZTmlRjb4E-Ah6cjcBmCyVLJJMpIZLpcjR5qCRCHy4xUhwVsxKmrkwAu-bjnqEdl7kOzocAGqRi89RQy3ki5LOGX7IWVYWGqI5D1cevBRcTwhziRdqw4hePXYfaJT-B6kVqgYheMbDCJMt7sjY6p8aM/s1920/gbpusd_prices.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPL7YlSst12DOlAFS7iNoO8eYlecJZ1grb5pOwe3ZTmlRjb4E-Ah6cjcBmCyVLJJMpIZLpcjR5qCRCHy4xUhwVsxKmrkwAu-bjnqEdl7kOzocAGqRi89RQy3ki5LOGX7IWVYWGqI5D1cevBRcTwhziRdqw4hePXYfaJT-B6kVqgYheMbDCJMt7sjY6p8aM/s320/gbpusd_prices.png" width="320" /></a></div><p>where the black line is the actual 10 minute closing prices of GBPUSD over the
last week (13th to 18th August) with the corresponding GBP price (blue line) being the "alternative" GBPUSD chart if U is held at 1 in the ( G / U ) term and G allowed to be its derived, optimised value, and the USD price (red line) being the alternative chart if G is held at 1 and U allowed to be its derived, optimised value.</p><p>This second chart shows a more "traditional" index like chart</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCDFZsF2EU4Xoah-SQQauUpHiraq_T9-E8DXrC5fM1-mPOmPLL3FxkcUmHHuBjARO6px7fNnGdjXSNpiUcQ9QTgWB1PRuTHAfZ644J6iososx8Fjxm5qhJxSAehUconwor1OPpx_NZS3lx1TVZq_zWIW_g7BCH1fSLHVuLL7nav9DGkP8UEKT9_IIpviLa/s1920/gbpusd_index.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCDFZsF2EU4Xoah-SQQauUpHiraq_T9-E8DXrC5fM1-mPOmPLL3FxkcUmHHuBjARO6px7fNnGdjXSNpiUcQ9QTgWB1PRuTHAfZ644J6iososx8Fjxm5qhJxSAehUconwor1OPpx_NZS3lx1TVZq_zWIW_g7BCH1fSLHVuLL7nav9DGkP8UEKT9_IIpviLa/s320/gbpusd_index.png" width="320" /></a></div>where the starting values are 1 and both the G and U values take their derived values. As can be seen, over the week there was upwards momentum in both the GBP and USD, with the greater momentum being in the GBP resulting in a higher GBPUSD quote at the end of the week. If, in the second chart the blue GBP line had been flat at a value of 1 all week, the upwards momentum in USD would have resulted in a lower week ending quoted value of GBPUSD, as seen in the red USD line in the first chart. Having access to these real, decoupled returns allows one to see through the given, quoted forex prices in the manner of viewing the market as though through X-ray vision. <br /><p></p><p>I hope readers find this post enlightening, and if you find some other uses for this idea, I would be interested in hearing how you use it. <br /> </p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5560052943772419320.post-61372059899731210552023-05-30T13:07:00.000+02:002023-05-30T13:07:31.929+02:00Quick Update on Kalman Filter and Sensor Fusion<p>Managed to code it up and get it working, but at the end of the day I couldn't see any value added over just averaging the output of the indicators I was trying to fuse together via Kalman filtering. As a result, I'm giving up on this for now and looking at other things.</p><p>More in due course. <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-69052734173709682512023-02-28T19:26:00.001+01:002023-02-28T19:28:52.934+01:00Kalman Filter and Sensor Fusion.<p>In the Spring of 2012 and again in the Spring of 2019 I posted a series of posts about the <a href="https://en.wikipedia.org/wiki/Kalman_filter" target="_blank">Kalman Filter</a>, which readers can access via the blog archive on the right. In both cases I eventually gave up those particular lines of investigation because of disappointing results. This post is the first in a new series about using the Kalman Filter for <a href="https://en.wikipedia.org/wiki/Sensor_fusion" target="_blank">sensor fusion</a>, which I had known of before, but due to the paucity of clear information about this online I had never really investigated. However, my recent discovery of <a href="https://github.com/simondlevy/SensorFusion" target="_blank">this Github</a> and its associated <a href="https://simondlevy.academic.wlu.edu/kalman-tutorial/" target="_blank">online tutorial</a> has inspired me to a third attempt at using Kalman Filters. What I am going to attempt to do is use the idea of sensor fusion to fuse the output of several functions I have coded in the past, which each extract the dominant cycle from a time series, to hopefully obtain a better representation of the "true underlying cycle."</p><p>The first step in this process is to determine the measurement noise covariance or, in Kalman Filter terms, the "R" covariance matrix. To do this, I have used the average of two of the outputs from the above mentioned functions to create a new cycle and similarly used two extracted trends (price minus these cycles) averaged to get a new trend. The new cycle and new trend are simply added to each other to create a new price series which is almost identical to the original price series. The screenshot below shows a typical cycle extract,</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6hB-IAkK1Qu-scFDKI7oPcVBbs3nnl8IiDrZ2yS4kcegFJKRvjL9voLDEAtWUvzsWIEULlliNEbtSH0SCLbBbO6k35GsVCmdUxfs7XQRHbNFfdVDTyyEq5BW4Df93dS-Ewr4RnfejUl-DA5zcv6K3f7filf30dmom9IBW2TnpzAJIaJBXGtgzQ4vjQ/s1920/new_cycle.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6hB-IAkK1Qu-scFDKI7oPcVBbs3nnl8IiDrZ2yS4kcegFJKRvjL9voLDEAtWUvzsWIEULlliNEbtSH0SCLbBbO6k35GsVCmdUxfs7XQRHbNFfdVDTyyEq5BW4Df93dS-Ewr4RnfejUl-DA5zcv6K3f7filf30dmom9IBW2TnpzAJIaJBXGtgzQ4vjQ/s320/new_cycle.png" width="320" /></a></div>where the red cycle is the average of the other two extracted cycles, and this following screenshot shows the new trend in red plus the new price alongside the old price (blue and black respectively).<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Nqi0x2O2fxPuRzmYxUJ-_WU4KZV4SVkrF9PXkjtthW-nXqBp6ojaZVof87-5N8LI2jjm8haM2-2iJyufy0k-mcx6-zuiKJRnHlhDF1yJyeAEiSZUmEDsjyVKDxoZG6A4s5aOT-1z8ZEaSfov8vcKcoys5_JF_naJVyNIzzg4m4elc6rQqh3UFEztfQ/s1920/new_price.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Nqi0x2O2fxPuRzmYxUJ-_WU4KZV4SVkrF9PXkjtthW-nXqBp6ojaZVof87-5N8LI2jjm8haM2-2iJyufy0k-mcx6-zuiKJRnHlhDF1yJyeAEiSZUmEDsjyVKDxoZG6A4s5aOT-1z8ZEaSfov8vcKcoys5_JF_naJVyNIzzg4m4elc6rQqh3UFEztfQ/s320/new_price.png" width="320" /></a></div> Having created a time series thus with known trend and cycle, it is a simple matter to run my cycle extractor functions on this new price, compare the outputs with the known cyclic component of price and calculate the variance of the errors to get the R covariance matrices for 14 different currency crosses.<p></p><p>More in due course.<br /></p><p> <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-42308037805643407512022-11-18T18:56:00.002+01:002022-11-18T18:56:33.161+01:00PositionBook Chart Example Trade<p>As a quick follow up to my <a href="https://dekalogblog.blogspot.com/2022/11/a-new-positionbook-chart-type.html" target="_blank">previous post</a> I thought I'd show an example of how one could possibly use my new PositionBook chart as a trade set-up. Below is the USD_CHF forex pair for the last two days</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi19qLQ8Nqh9Yxux80MTcTcUQufY_1JLU6RzXBqDTUqQsHVoA2u82Zabf9dSa5lILUzBUyW0HdAPEXyghRgcebh1ez2WplM5NXRntxzo0eEEXJaBcAu8wVwAUa7dXSe4ZJGtCRTdyw2grv1sWgJAN0g0kN8Ls0lyoo0nfn_UOJA3LRXvnE4N7Un_D8Jqw/s1920/pb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi19qLQ8Nqh9Yxux80MTcTcUQufY_1JLU6RzXBqDTUqQsHVoA2u82Zabf9dSa5lILUzBUyW0HdAPEXyghRgcebh1ez2WplM5NXRntxzo0eEEXJaBcAu8wVwAUa7dXSe4ZJGtCRTdyw2grv1sWgJAN0g0kN8Ls0lyoo0nfn_UOJA3LRXvnE4N7Un_D8Jqw/w400-h225/pb.png" width="400" /></a></div>showing the nice run-up yesterday and then the narrow range of Friday's Asian session.<p></p><p>The tentative set-up idea is to look for such a narrow range and use the colour of the PositionBook chart in this range (blue for a long) to catch or anticipate a breakout. The take profit target would be the resistance suggested by the horizontal yellow bar in the open orders chart (overhead sell orders) more or less at Thursday's high.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8eqfRcAM8bTJa6ClXUqDIHc2GhXrq-MDePQDIvgi2UQdxpMtxytMUcnCYVdoCgaP-6PCNmPO0DbBmMdchFTE6852fcqjMO5spyj7lFbcfwiAbEHQa0L1e-ErGQU1sO9UDGFubp-ZSI4r6exbMqxdyL0W4l-plmTq7u6kjRp6mll4D_6TzoYZF2gqetA/s1920/oo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8eqfRcAM8bTJa6ClXUqDIHc2GhXrq-MDePQDIvgi2UQdxpMtxytMUcnCYVdoCgaP-6PCNmPO0DbBmMdchFTE6852fcqjMO5spyj7lFbcfwiAbEHQa0L1e-ErGQU1sO9UDGFubp-ZSI4r6exbMqxdyL0W4l-plmTq7u6kjRp6mll4D_6TzoYZF2gqetA/w400-h225/oo.png" width="400" /></a></div>I decided to take a <i>really </i>small punt on this idea but took a small loss of 0.0046 GBP<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRDGv9HDkGUr3ZvzovewhiwqZdg2mi9Vno5mWfvW-UjRei8QC8zqiDBMZ7IofOREwuucoP6NLQMd7tjT7uulzcdSvz7i5b8WuK1S2eG-YXkXk9lqFAS8euVWtmVSr93PCwNAsLqleVA-kaW8c6StfgNGO-oricLX0_CSt2dGhkDUpb3Vz-ExW0V1wptA/s1372/oanda.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="572" data-original-width="1372" height="166" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRDGv9HDkGUr3ZvzovewhiwqZdg2mi9Vno5mWfvW-UjRei8QC8zqiDBMZ7IofOREwuucoP6NLQMd7tjT7uulzcdSvz7i5b8WuK1S2eG-YXkXk9lqFAS8euVWtmVSr93PCwNAsLqleVA-kaW8c6StfgNGO-oricLX0_CSt2dGhkDUpb3Vz-ExW0V1wptA/w400-h166/oanda.png" width="400" /></a></div>as indicated in the above Oanda trade app. I entered too soon and perhaps should have waited for confirmation (I can see a doji bar on the 5 minute chart just after my stop out) or had the conviction to re-enter the trade after this doji bar. The initial trade idea seems to have been sound as the profit target was eventually hit. This could have been a nice 4/5/6 <a href="https://www.vantharp.com/trading/wp-content/uploads/2018/06/A_Short_Lesson_on_R_and_R-multiple.pdf" target="_blank">R-multiple</a> profitable trade.😞 <br /><br /><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-17195683055300652742022-11-11T21:53:00.005+01:002022-11-13T11:11:56.554+01:00A New PositionBook Chart Type<p>It has been almost 6 months since I last posted, due to working on a house renovation. However, I have still been thinking about/working on stuff, particularly on <a href="https://www1.oanda.com/lang/en/forex-trading/analysis/open-position-ratios" target="_blank">analysis of open position ratios. </a>I had tried using this data as features for <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a>, but my thinking has evolved somewhat and I have reduced my ambition/expectation for this type of data.</p><p>Before I get into this I'd like to mention <a href="https://www.trader-dale.com/" target="_blank">Trader Dale</a> (I have no affiliation with him) as I have recently been following his <a href="https://dekalogblog.blogspot.com/2021/08/another-iterative-improvement-of-my.html" target="_blank">volume profile</a> set-ups, a screenshot of one being shown below. <br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0b9dfMe1wjPWMPgdVr3z7mFZBR3Gv-2R3kiERm6iekhs6Jt8FqPykHpak5KMMmPH3VhIZoej0XCDreDqAXqCMlZpHlp5JoHBMnCmpHvpITRxIXF-REnxFNP-MZLXy-evocDc7BkfQsuJlnyDK92cUqEMI9SJPZ-mtSjaRCYqKeGvLOovnl4p733Z-cg/s1920/eur_gbp_td.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0b9dfMe1wjPWMPgdVr3z7mFZBR3Gv-2R3kiERm6iekhs6Jt8FqPykHpak5KMMmPH3VhIZoej0XCDreDqAXqCMlZpHlp5JoHBMnCmpHvpITRxIXF-REnxFNP-MZLXy-evocDc7BkfQsuJlnyDK92cUqEMI9SJPZ-mtSjaRCYqKeGvLOovnl4p733Z-cg/w400-h179/eur_gbp_td.png" width="400" /></a></div>This shows recent Wednesday action in the EUR_GBP pair on a 30 minute chart. The flexible volume profile set-up Trader Dale describes is called a Volume Accumulation Set-up which occurs immediately prior to a big break (in this case up). The whole premise of this particular set-up is that the volume accumulation area will be future support, off of which price will bounce, as shown by the "hand drawn" lines. Below is shown my version of the above chart<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQK4QBsf50KBQfNm1IrffSELR8rVyN3k05TsyfApW7j0qptANOKW7G62LZV-_25Bdwg0rWrnyr0VQGyoj3BWZsk5YJNP21VauVYih-0jAxq8rAopO872LJKE9JLuGMll2WGm295tuYAMg8CQfLRhYDslMnW78SCzestTNDMvjhl84kHDoC50LEZV-uNg/s1920/eur_gbp_1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQK4QBsf50KBQfNm1IrffSELR8rVyN3k05TsyfApW7j0qptANOKW7G62LZV-_25Bdwg0rWrnyr0VQGyoj3BWZsk5YJNP21VauVYih-0jAxq8rAopO872LJKE9JLuGMll2WGm295tuYAMg8CQfLRhYDslMnW78SCzestTNDMvjhl84kHDoC50LEZV-uNg/w400-h225/eur_gbp_1.png" width="400" /></a></div>with a bit of extra price action included. The horizontal yellow lines show the support area.<p></p><p>Now here is the same data, but in what I'm calling a PositionBook chart, which uses Oanda's Position Level data downloaded via their API.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjoMsR0ZpIDmkdlebFvRldU4iMVQdisS4afgzzXI3HS3OuGta0s0kILVL-C7ezRjjDkKvyQTHVnvxZe_E2GHj6pDkPSukDJvwE5yC4T153igXMn318TnBHKWc3XdpKxERJRtXe4uqs-gzFsA9y2qBAzQf0Cr14tFRYReM44QlGmzwjaDNWxr6D7bP8pw/s1920/eur_gbp_2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjoMsR0ZpIDmkdlebFvRldU4iMVQdisS4afgzzXI3HS3OuGta0s0kILVL-C7ezRjjDkKvyQTHVnvxZe_E2GHj6pDkPSukDJvwE5yC4T153igXMn318TnBHKWc3XdpKxERJRtXe4uqs-gzFsA9y2qBAzQf0Cr14tFRYReM44QlGmzwjaDNWxr6D7bP8pw/w400-h225/eur_gbp_2.png" width="400" /></a></div>The blue (red) horizontal lines show the levels at which traders are net long (short) in terms of positions actually entered/held. The brighter the colours the greater the difference between the longs/shorts. It is obvious that the volume accumulation set-up area is showing a net accumulation of long positions and this is an indication of the direction of the anticipated breakout long before it happens. The Trader Dale set-up presumes an accumulation of longs <i>because</i> <i>of </i>the resultant breakout direction and doesn't seem to provide an opportunity to participate in the breakout itself!<p></p><p>The next chart shows the action of the following day and a bit where the price does indeed come back down to the "support" area but doesn't result in an immediate bounce off the support level. The following order level chart perhaps shows why there was no bounce - the relative absence of open orders at that level.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSD3hong6FZNZU8mKv_wDm5MbW_W8QDHMz8E9r9P1fSyP60KJWgofi0lMMbuy5E19QanznjVQ2LcLyJddpbSSUqnfDwlaJFv4fL5FpcEISAQYaTl07S4SsZbfJ33UlIdirtJbsgpLwI1B34E8r38SiT96FDH6VREtRHh4nMWyudu5ym79_P0uYBUa8jw/s1920/eur_gbp_orders.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSD3hong6FZNZU8mKv_wDm5MbW_W8QDHMz8E9r9P1fSyP60KJWgofi0lMMbuy5E19QanznjVQ2LcLyJddpbSSUqnfDwlaJFv4fL5FpcEISAQYaTl07S4SsZbfJ33UlIdirtJbsgpLwI1B34E8r38SiT96FDH6VREtRHh4nMWyudu5ym79_P0uYBUa8jw/w400-h225/eur_gbp_orders.png" width="400" /></a></div>The equivalent PositionBook chart, including a bit more price action,<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUNaHFW-7Bv3PJskH3MmhUTv4AD3VbXtd8kGfqyj52MvMMG0F6-eEkg7CUrQV0-WnFc2nc6qrwE3Z3dwuAMS-YIE0Qw8l5VKVG_j9gPCIBGSO2kOT0UAX1nwtzsleHoFs25SEABzhq71hMHeJb2hl3n0N8OSD55kKVj7WgZalusMMq-RJObz9BpSq7Ow/s1920/eur_gbp_final.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUNaHFW-7Bv3PJskH3MmhUTv4AD3VbXtd8kGfqyj52MvMMG0F6-eEkg7CUrQV0-WnFc2nc6qrwE3Z3dwuAMS-YIE0Qw8l5VKVG_j9gPCIBGSO2kOT0UAX1nwtzsleHoFs25SEABzhq71hMHeJb2hl3n0N8OSD55kKVj7WgZalusMMq-RJObz9BpSq7Ow/w400-h225/eur_gbp_final.png" width="400" /></a></div>shows that after price fails to bounce off the support level it does recover back into it and then even more long positions are accumulated (the darker blue shade) at the support level during the London open, again allowing one to position oneself for the ensuing rise during the London morning session, followed by another long accumulation during the New York opening session for a following leg up into the London close (the last vertical red line).<p></p><p>This purpose of this post is not to criticise the Trader Dale set-up but rather to highlight the potential value-add of these new PositionBook charts. They seem to hold promise for indicating price direction and I intend to continue investigating/improving them in the coming weeks.</p><p>More in due course.</p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5560052943772419320.post-25363286667575016842022-04-08T16:37:00.002+02:002022-04-08T16:37:44.318+02:00Simple Machine Learning Models on OrderBook/PositionBook Features<p>This post is about using OrderBook/PositionBook features as input to simple <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> models after <a href="https://dekalogblog.blogspot.com/2022/03/orderbook-and-positionbook-features.html" target="_blank">previous</a> investigation into the relevance of such features. </p><p>Due to the amount of training data available I decided to look only at a linear model and small <a href="https://en.wikipedia.org/wiki/Neural_network" target="_blank">neural networks</a> (NN) with a single hidden layer with up to 6 hidden neurons. This choice was motivated by an academic paper I read online about linear models which stated that, as a lower bound, one should have at least 10 training examples for each parameter to be estimated. Other online reading about order flow imbalance (OFI) suggested there is a linear relationship between OFI and price movement. Use of limited size NNs would allow a small amount of non linearity in the relationship. For this investigation I used the <a href="https://github.com/sods/netlab" target="_blank">Netlab</a> toolbox and <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a>. A plot of the <a href="https://en.wikipedia.org/wiki/Learning_curve_(machine_learning)" target="_blank">learning curves</a> of the classification models tested is shown below. The targets were binary 1/0 for price increases/decreases.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU0QlFhCHW_PCK_OPml1r3gV16FbZrTPosWVOrQx5rUXm7QmUyJl0510G2jQB6w5e3lZoV-hfBKG6a_GMHxTNmyodiizbCCzaRodv_CrMcbnJ1g7dFlUvkUMgMeO5J_tGMJkZDwhzZ3JgGuqQxyWSlzsdp_VOQjdYVHSZd1Q7lnokfRGbr8Nl4br4U8w/s1542/netlab_training.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="840" data-original-width="1542" height="174" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU0QlFhCHW_PCK_OPml1r3gV16FbZrTPosWVOrQx5rUXm7QmUyJl0510G2jQB6w5e3lZoV-hfBKG6a_GMHxTNmyodiizbCCzaRodv_CrMcbnJ1g7dFlUvkUMgMeO5J_tGMJkZDwhzZ3JgGuqQxyWSlzsdp_VOQjdYVHSZd1Q7lnokfRGbr8Nl4br4U8w/s320/netlab_training.png" width="320" /></a></div><p></p><p>The blue lines show the average training error (y axis) and the red lines show the same average error metric on the held out <a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)" target="_blank">cross validation</a> data set for each tested model. The thickness of the lines represents the number of neurons in the single hidden layer of the NNs (the thicker the lines, the higher the number of hidden neurons). The horizontal green line shows the error of a <a href="https://en.wikipedia.org/wiki/Generalized_linear_model" target="_blank">generalized linear model (GLM)</a> trained using <a href="https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares" target="_blank">iteratively reweighted least squares</a>. It can be seen that NN models with 1 and 2 hidden neurons slightly outperform the GLM, with the 2 neuron model having the edge over the 1 neuron model. NN models with 3 or more hidden neurons over fit and underperform the GLM. The NN models were trained using Netlab's functions for <a href="https://en.wikipedia.org/wiki/Bayesian_interpretation_of_kernel_regularization" target="_blank">Bayesian regularization</a> over the parameters.</p><p>Looking at these results it would seem that a 2 neuron NN would be the best choice; however the error differences between the 1 and 2 neuron NNs and GLM are small enough to anticipate that the final classifications (with a basic greater/less than a 0.5 logistic threshold value for long/short) would perhaps be almost identical. </p><p>Investigations into this will be the subject of my next post. </p><p>The code box below gives the working Octave code for the above.</p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>## load data
##training_data = dlmread( 'raw_netlab_training_features' ) ;
##cv_data = dlmread( 'raw_netlab_cv_features' ) ;
training_data = dlmread( 'netlab_training_features_svd' ) ;
cv_data = dlmread( 'netlab_cv_features_svd' ) ;
training_targets = dlmread( 'netlab_training_targets' ) ;
cv_targets = dlmread( 'netlab_cv_targets' ) ;
kk_loop_record = zeros( 30 , 7 ) ;
for kk = 1 : 30
## first train a glm model as a base comparison
input_dim = size( training_data , 2 ) ; ## Number of inputs.
net_lin = glm( input_dim , 1 , 'logistic' ) ; ## Create a generalized linear model structure.
options = foptions ; ## Sets default parameters for optimisation routines, for compatibility with MATLAB's foptions()
options(1) = 1 ; ## change default value
## OPTIONS(1) is set to 1 to display error values during training. If
## OPTIONS(1) is set to 0, then only warning messages are displayed. If
## OPTIONS(1) is -1, then nothing is displayed.
options(14) = 5 ; ## change default value
## OPTIONS(14) is the maximum number of iterations for the IRLS
## algorithm; default 100.
net_lin = glmtrain( net_lin , options , training_data , training_targets ) ;
## test on cv_data
glm_out = glmfwd( net_lin , cv_data ) ;
## cross-entrophy loss
glm_out_loss = -mean( cv_targets .* log( glm_out ) .+ ( 1 .- cv_targets ) .* log( 1 .- glm_out ) ) ;
kk_loop_record( kk , 7 ) = glm_out_loss ;
## now train an mlp
## Set up vector of options for the optimiser.
nouter = 30 ; ## Number of outer loops.
ninner = 2 ; ## Number of innter loops.
options = foptions ; ## Default options vector.
options( 1 ) = 1 ; ## This provides display of error values.
options( 2 ) = 1.0e-5 ; ## Absolute precision for weights.
options( 3 ) = 1.0e-5 ; ## Precision for objective function.
options( 14 ) = 100 ; ## Number of training cycles in inner loop.
training_learning_curve = zeros( nouter , 6 ) ;
cv_learning_curve = zeros( nouter , 6 ) ;
for jj = 1 : 6
## Set up network parameters.
nin = size( training_data , 2 ) ; ## Number of inputs.
nhidden = jj ; ## Number of hidden units.
nout = 1 ; ## Number of outputs.
alpha = 0.01 ; ## Initial prior hyperparameter.
aw1 = 0.01 ;
ab1 = 0.01 ;
aw2 = 0.01 ;
ab2 = 0.01 ;
## Create and initialize network weight vector.
prior = mlpprior(nin , nhidden , nout , aw1 , ab1 , aw2 , ab2 ) ;
net = mlp( nin , nhidden , nout , 'logistic' , prior ) ;
## Train using scaled conjugate gradients, re-estimating alpha and beta.
for ii = 1 : nouter
## train net
net = netopt( net , options , training_data , training_targets , 'scg' ) ;
train_out = mlpfwd( net , training_data ) ;
## get train error
## mse
##training_learning_curve( ii ) = mean( ( training_targets .- train_out ).^2 ) ;
## cross entropy loss
training_learning_curve( ii , jj ) = -mean( training_targets .* log( train_out ) .+ ( 1 .- training_targets ) .* log( 1 .- train_out ) ) ;
cv_out = mlpfwd( net , cv_data ) ;
## get cv error
## mse
##cv_learning_curve( ii ) = mean( ( cv_targets .- cv_out ).^2 ) ;
## cross entropy loss
cv_learning_curve( ii , jj ) = -mean( cv_targets .* log( cv_out ) .+ ( 1 .- cv_targets ) .* log( 1 .- cv_out ) ) ;
## now update hyperparameters based on evidence
[ net , gamma ] = evidence( net , training_data , training_targets , ninner ) ;
## fprintf( 1 , '\nRe-estimation cycle ##d:\n' , ii ) ;
## disp( [ ' alpha = ' , num2str( net.alpha' ) ] ) ;
## fprintf( 1 , ' gamma = %8.5f\n\n' , gamma ) ;
## disp(' ')
## disp('Press any key to continue.')
##pause;
endfor ## ii loop
endfor ## jj loop
kk_loop_record( kk , 1 : 6 ) = cv_learning_curve( end , : ) ;
endfor ## kk loop
plot( training_learning_curve(:,1) , 'b' , 'linewidth' , 1 , cv_learning_curve(:,1) , 'r' , 'linewidth' , 1 , ...
training_learning_curve(:,2) , 'b' , 'linewidth' , 2 , cv_learning_curve(:,2) , 'r' , 'linewidth' , 2 , ...
training_learning_curve(:,3) , 'b' , 'linewidth' , 3 , cv_learning_curve(:,3) , 'r' , 'linewidth' , 3 , ...
training_learning_curve(:,4) , 'b' , 'linewidth' , 4 , cv_learning_curve(:,4) , 'r' , 'linewidth' , 4 , ...
training_learning_curve(:,5) , 'b' , 'linewidth' , 5 , cv_learning_curve(:,5) , 'r' , 'linewidth' , 5 , ...
training_learning_curve(:,6) , 'b' , 'linewidth' , 6 , cv_learning_curve(:,6) , 'r' , 'linewidth' , 6 , ...
ones( size( training_learning_curve , 1 ) , 1 ).*glm_out_loss , 'g' , 'linewidth', 2 ) ;
## >> mean(kk_loop_record)
## ans =
##
## 0.6928 0.6927 0.7261 0.7509 0.7821 0.8112 0.6990
## >> std(kk_loop_record)
## ans =
##
## 8.5241e-06 7.2869e-06 1.2999e-02 1.5285e-02 2.5769e-02 2.6844e-02 2.2584e-16</code></pre><p></p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-5560052943772419320.post-85924749247032977382022-03-25T18:52:00.000+01:002022-03-25T18:52:08.806+01:00OrderBook and PositionBook Features<div><p>In my previous post I talked about how I planned to use <a href="https://en.wikipedia.org/wiki/Constrained_optimization" target="_blank">constrained optimization</a> to create features from <a href="https://www.oanda.com/eu-en/" target="_blank">Oanda's</a> OrderBook and PositionBook data, which can be downloaded via their <a href="https://developer.oanda.com/" target="_blank">API</a>. In addition to this I have also created a set of features based on the idea of Order Flow Imbalance (OFI), a nice exposition of which is given in <a href="https://towardsdatascience.com/price-impact-of-order-book-imbalance-in-cryptocurrency-markets-bf39695246f6?gi=d5c9eb06bcee" target="_blank">this blog post</a> along with a numerical example of how to calculate OFI. Of course Oanda's OrderBook/PositionBook data is not exactly the same as a conventional <a href="https://en.wikipedia.org/wiki/Order_book" target="_blank">limit order book,</a> but I thought they are similar enough to investigate using OFI on them. The result of these investigations is shown in the animated GIF below.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNQIsH6NmqkT6cOdQDbsXvMxxMNdCCDyEbYE03CQ072wsp4s7Nactp8GfjtrJghj0phR-2C5T334ga7oJObCdgvaWErzdjq6FEJLbbJgxWSZ_dJY9JNBGN6ZllEr_vt4eLkULDr-cymXGbOIdJQNOdmsBywyJccYyLPxd3TEkI6qKD0V-IEmxtfFIa_Q/s866/animation.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="546" data-original-width="866" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNQIsH6NmqkT6cOdQDbsXvMxxMNdCCDyEbYE03CQ072wsp4s7Nactp8GfjtrJghj0phR-2C5T334ga7oJObCdgvaWErzdjq6FEJLbbJgxWSZ_dJY9JNBGN6ZllEr_vt4eLkULDr-cymXGbOIdJQNOdmsBywyJccYyLPxd3TEkI6qKD0V-IEmxtfFIa_Q/s320/animation.gif" width="320" /></a></div><p>This shows the output from using the <a href="https://www.r-project.org/" target="_blank">R</a> <a href="https://cran.r-project.org/web/packages/Boruta/Boruta.pdf" target="_blank">Boruta</a> package to check for the feature relevance of OFI levels to a depth of 20 of both the OrderBook and PositionBook to classify the sign of the log return of price over the periods detailed below following an OrderBook/PositionBook update (the granularity at which the OrderBook/PositionBook data can be updated is 20 minutes):</p><ul style="text-align: left;"><li>20 minutes</li><li>40 minutes</li><li>60 minutes</li><li>the 20 minutes starting 20 minutes in the future</li><li>the 20 minutes starting 40 minutes in the future</li></ul>for both the OrderBook and PositionBook, giving a total of 10 separate images/results in the above GIF.</div><div> </div><div>Observant readers may notice that in the GIF there are 42 features being checked, but only an OFI depth of 20. The reason for this is that the data contain information about buys/sell orders and long/short positions both above and below the current price, so what I did was calculate OFI for: <br /></div><div><ul style="text-align: left;"><li>buy orders above price vs sell orders below price</li><li>sell orders above price vs buy orders below price</li><li>long positions above price vs short positions below price</li><li>short positions above price vs long positions below price </li></ul>As can be seen, almost all features are deemed to be relevant with the exception of 3 OFI levels rejected (red candles) and 2 deemed tentative (yellow candles).</div><div><br /></div><div>It is my intention to use these features in a <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> model to classify the probability of future market direction over the time frames mentioned above. </div><div><br /></div><div>More in due course.<br /></div><div><p></p><p></p><div class="separator" style="clear: both; text-align: center;"> </div><p></p></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-91170131985556448162022-02-15T15:59:00.004+01:002022-02-15T16:00:33.833+01:00A Possible, New Positionbook Indicator?<p>In my <a href="https://dekalogblog.blogspot.com/2022/01/matrix-profile-and-weakly-labelled-data.html" target="_blank">previous post</a> I ended with saying that I would post about some sort of "sentiment indicator" if, and only if, I had something positive to say about my progress on this work. This post is the first on this subject.</p><p>The indicator I'm working on is based on the <a href="https://www1.oanda.com/lang/en/forex-trading/analysis/open-position-ratios" target="_blank">open position ratios</a> data that is available via the <a href="https://developer.oanda.com/" target="_blank">Oanda api</a>. For the uninitiated, this data gives the percentage of traders holding long and short positions, and at what price levels, in 14 selected forex pairs and also gold and silver. The data is updated every 20 minutes. I have long felt that there must be some value hidden in this data but the problem is how to extract it.</p><p>What I've done is take the percentage values from the (usually) hundreds of separate price levels and sum and normalise them over three defined ranges - levels above/below the high/low of each 20 minute period and the level(s) that span the price range of this period. This is done separately for long and short positions to give a total of 6 percentage figures that sum to 100%. Conceptually, this can be thought of as attaching to the open and close of a 20 minute OHLC bar the 6 percentage position values that were in force at the open and close respectively. The problem is to try and infer the actual, net changes in positions that have taken place over the time period this 20 minute bar was forming. In this way I am trying, if you like, to create a sort of <a href="https://en.wikipedia.org/wiki/Skin_in_the_game_(phrase)" target="_blank">"skin in the game"</a> indicator as opposed to an indicator derived from order book data, which could be said to be based on traders' current (changeable) intentions as expressed by their open orders and which are subject to shenanigans such as <a href="https://en.wikipedia.org/wiki/Spoofing_(finance)" target="_blank">spoofing</a>.</p><p>The methodology I've decided on to realise the above is <a href="https://en.wikipedia.org/wiki/Constrained_optimization" target="_blank">constrained optimization</a> using <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave's</a> <a href="https://octave.sourceforge.io/optim/function/fmincon.html" target="_blank">fmincon</a> function. The objective function is simply:</p><p><span> </span>denom = X' * old_pb_net_pos ;
<br /><span> </span>J = mean( ( new_pb_net_pos .- ( ( X .* old_pb_net_pos ) ./ denom ) ).^2 ) ;</p><p>for a multiplicative position value change model where:</p><ul style="text-align: left;"><li>X is a vector of constants that are to be optimised</li><li>old_pb_net_pos is a vector of the 6 percentage values at the open</li><li>new_pb_net_pos is a vector of the 6 percentage values at the close</li></ul><p>This is a constrained model because percentage position values at price levels outside the bar range cannot actually increase as a result of trades that take place within the bar range, so the X values for these levels are necessarily constrained to a maximum value of 1 (implying no real, absolute change at these levels). Similarly, all X values must be greater than zero (a zero value would imply a mass exit of all positions at this level, which never actually happens).</p><p>The net result of the above is an optimised X vector consisting of multiplicative constants that are multiplied with old_pb_net_pos to achieve new_pb_net_pos according to the logic exemplified in the above objective function. It is these optimised X values from which the underlying, real changes in positions will be inferred and features created. More on this in my next post. <br /></p><p> <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-36212207785741820852022-01-04T22:30:00.000+01:002022-01-04T22:30:22.041+01:00Matrix Profile and Weakly Labelled Data - 2nd and Final Update<p>It has been over three months since my <a href="https://dekalogblog.blogspot.com/2021/09/matrix-profile-and-weakly-labelled-data.html" target="_blank">last post</a>, which was intended to be the first in a series of posts on the subject of the title of this post. However, it turned out that the results of my work were underwhelming and so I decided to stop flogging a dead horse and move onto other things. I still have some ideas for using <a href="https://matrixprofile.org/" target="_blank">Matrix Profile</a>, but not for the above. These ideas may be the subject of a future blog post.</p><p>I subsequently looked at plotting order levels using the data that is available via the <a href="https://developer.oanda.com/" target="_blank">Oanda API</a> and I have come up with <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> code to render plots such as this:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiDcAvrfVoBLxf9MFu4_Xtk2LF6mVilDxhd3Kwyhq7vnmbdjkIZ13OgQlHVTn47nHPmr0Q7fMUnY66mM_rYxoX3EgxVUU6sWW8CO-QvQtHYtFFjqpkqyPYAl8jQuuHfyh1XW7fWT4yPs_NkwZTeDN0K6Hw5JO1Qc5c5QKMf6Nk83hXuaLF10k16YzlP-A=s1905" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="953" data-original-width="1905" height="160" src="https://blogger.googleusercontent.com/img/a/AVvXsEiDcAvrfVoBLxf9MFu4_Xtk2LF6mVilDxhd3Kwyhq7vnmbdjkIZ13OgQlHVTn47nHPmr0Q7fMUnY66mM_rYxoX3EgxVUU6sWW8CO-QvQtHYtFFjqpkqyPYAl8jQuuHfyh1XW7fWT4yPs_NkwZTeDN0K6Hw5JO1Qc5c5QKMf6Nk83hXuaLF10k16YzlP-A=s320" width="320" /></a></div>where the brighter yellow stripes show ranges where there is an accumulation of sell/buy orders above/below price. These can be interpreted as support/resistance areas. It is normally my practice to post my Octave code, but the code for this plot is quite idiosyncratic and depends very much on the way I have chosen to store the underlying data downloaded from Oanda. As such, I don't think it would be helpful to readers and so I am not posting the code. That said, if there is actually a demand I am more than happy to make it available in a future blog post.<p></p><p>Having done this, it seemed natural to extend it to <a href="https://www1.oanda.com/lang/en/forex-trading/analysis/open-position-ratios" target="_blank">Open Position Ratios</a> which are also available via the Oanda API. Plotting these levels renders plots that are similar to the plot shown above, but show levels where open long/short positions instead of open orders are accumulated. Although such plots are visually informative, I prefer something more objective, and so for the last few weeks I have been working on using the open position ratios data to construct some sort of sentiment indicator that hopefully could give a heads up to future price movement direction. This is still very much a work in progress which I shall post about if there are noteworthy results.</p><p>More in due course. <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-47073239621871428122021-09-17T20:04:00.000+02:002021-09-17T20:04:44.066+02:00Matrix Profile and Weakly Labelled Data - Update 1<p>This is the first post in a short series detailing my recent work following on from my <a href="https://dekalogblog.blogspot.com/2021/09/back-in-may-of-this-year-i-posted-about.html" target="_blank">previous post.</a> This post will be about some problems I have had and how I partially solved them.</p><p>The main problem was simply the speed at which the code (available from the <a href="https://sites.google.com/view/weaklylabeled" target="_blank">companion website</a>) seems to run. The first stage <a href="https://matrixprofile.org/" target="_blank">Matrix Profile</a> code runs in a few seconds, the second, individual evaluation stage in no more than a few minutes, but the third stage, greedy search, which uses <a href="https://en.wikipedia.org/wiki/Golden-section_search" target="_blank">Golden Section Search</a> over the pattern candidates, can take many, many hours. My approach to this was simply to optimise the code to the best of my ability. My optimisations, all in the compute_f_meas.m function, are shown in the following code boxes. This while loop</p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>i = 1;
while true
if i >= length(anno_st)
break;
endif
first_part = anno_st(1:i);
second_part = anno_st(i+1:end);
bad_st = abs(second_part - anno_st(i)) < sub_len;
second_part = second_part(~bad_st);
anno_st = [first_part; second_part;];
i = i + 1;
endwhile</code></pre>is replaced by this .oct compiled version of the same while loop<pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>#include <octave oct.h="">
#include <octave dcolvector.h="">
DEFUN_DLD ( stds_f_meas_while_loop_replace, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} stds_f_meas_while_loop_replace (@var{input_vector,sublen})\n\
This function takes an input vector and a scalar sublen\n\
length. The function sets to zero those elements in the\n\
input vector that are closer to the preceeding value than\n\
sublen. This function replaces a time consuming .m while loop\n\
in the stds compute_f_meas.m function.\n\
@end deftypefn" )
{
octave_value_list retval_list ;
int nargin = args.length () ;
// check the input arguments
if ( nargin != 2 ) // there must be a vector and a scalar sublen
{
error ("Invalid arguments. Inputs are a column vector and a scalar value sublen.") ;
return retval_list ;
}
if ( args(0).length () < 2 )
{
error ("Invalid 1st argument length. Input is a column vector of length > 1.") ;
return retval_list ;
}
if ( args(1).length () > 1 )
{
error ("Invalid 2nd argument length. Input is a scalar value for sublen.") ;
return retval_list ;
}
// end of input checking
ColumnVector input = args(0).column_vector_value () ;
double sublen = args(1).double_value () ;
double last_iter ;
// initialise last_iter value
last_iter = input( 0 ) ;
for ( octave_idx_type ii ( 1 ) ; ii < args(0).length () ; ii++ )
{
if ( input( ii ) - last_iter >= sublen )
{
last_iter = input( ii ) ;
}
else
{
input( ii ) = 0.0 ;
}
} // end for loop
retval_list( 0 ) = input ;
return retval_list ;
} // end of function</octave></octave></code></pre>and called thus<pre style="border-style: solid; border-width: 2px; height: 40px; overflow: auto; width: 500px;"><code>anno_st = stds_f_meas_while_loop_replace( anno_st , sub_len ) ;
anno_st( anno_st == 0 ) = [] ;</code></pre>This for loop<pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>is_tp = false(length(anno_st), 1);
for i = 1:length(anno_st)
if anno_ed(i) > length(label)
anno_ed(i) = length(label);
end
if sum(label(anno_st(i):anno_ed(i))) > 0.8*sub_len
is_tp(i) = true;
end
end
tp_pre = sum(is_tp);</code></pre>is replaced by use of <a href="https://octave.sourceforge.io/octave/function/cellslices.html" target="_blank">cellslices.m</a> and <a href="https://octave.sourceforge.io/octave/function/cellfun.html" target="_blank">cellfun.m</a> thus<pre style="border-style: solid; border-width: 2px; height: 80px; overflow: auto; width: 500px;"><code>label_length = length( label ) ;
anno_ed( anno_ed > label_length ) = label_length ;
cell_slices = cellslices( label , anno_st , anno_ed ) ;
cell_sums = cellfun( @sum , cell_slices ) ;
tp_pre = sum( cell_sums > 0.8 * sub_len ) ;</code></pre>and a further for loop<pre style="border-style: solid; border-width: 2px; height: 110px; overflow: auto; width: 500px;"><code>is_tp = false(length(pos_st), 1);
for i = 1:length(pos_st)
if sum(anno(pos_st(i):pos_ed(i))) > 0.8*sub_len
is_tp(i) = true;
end
end
tp_rec = sum(is_tp);</code></pre>is replaced by<pre style="border-style: solid; border-width: 2px; height: 50px; overflow: auto; width: 500px;"><code>cell_slices = cellslices( anno , pos_st , pos_ed ) ;
cell_sums = cellfun( @sum , cell_slices ) ;
tp_rec = sum( cell_sums > 0.8 * sub_len ) ;</code></pre><p>Although the above measurably improves running times, overall the code of the third stage is still sluggish. I have found that the best way to deal with this, on the advice of the original paper's author, is to limit the number of patterns to search for, the "pat_max" variable, to the minimum possible to achieve a satisfactory result. What I mean by this is that if pat_max = 5 and the result returned also has 5 identified patterns, incrementally increase pat_max until such time that the number of identified patterns is less than pat_max. This does, by necessity, mean running the whole routine a few times, but it is still quicker this way than drastically over estimating pat_max, i.e. choosing a value of say 50 to finally identify maybe only 5/6 patterns.</p><p>More in due course.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-59880663072488307412021-09-04T18:28:00.002+02:002021-09-04T18:30:10.418+02:00"Matrix profile: Using Weakly Labeled Time Series to Predict Outcomes" Paper<p>Back in May of this year <a href="https://dekalogblog.blogspot.com/2021/05/update-on-recent-matrix-profile-work.html" target="_blank">I posted</a> about how I had intended to use <a href="https://matrixprofile.org/" target="_blank">Matrix Profile (MP)</a> to somehow <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">cluster</a> the "initial balance" of <a href="https://en.wikipedia.org/wiki/Market_profile" target="_blank">Market Profile</a> charts with a view to getting a heads up on immediately following price action. Since then, my thinking has evolved due to my learning about the paper <a href="https://www.vldb.org/pvldb/vol10/p1802-yeh.pdf" target="_blank">"Matrix profile: Using Weakly Labeled Time Series to Predict Outcomes"</a> and its <a href="https://sites.google.com/view/weaklylabeled" target="_blank">companion website.</a> This very much seems to accomplish the same end I had envisaged with my clustering of initial balances, so I am going to try and use this approach instead.</p><p>As a preliminary, I have decided to "weakly label" my time series data using the simple code loop shown below.</p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>for ii = 1 : numel( ix )
y_values = train_data( ix( ii ) + 1 : ix( ii ) + 19 , 1 ) ;
london_session_ret = y_values( end ) - y_values( 1 ) ;
[ max_y , max_ix ] = max( y_values ) ;
max_long_ex = max_y - y_values( 1 ) ;
[ min_y , min_ix ] = min( y_values ) ;
max_short_ex = min_y - y_values( 1 ) ;
if ( london_session_ret > 0 && ( max_long_ex / ( -1 * max_short_ex ) ) >= 3 && max_ix > min_ix )
labels( ix( ii ) - 11 : ix( ii ) , 1 ) = 1 ;
elseif ( london_session_ret < 0 && ( max_short_ex / max_long_ex ) <= -3 && max_ix < min_ix )
labels( ix( ii ) - 11 : ix( ii ) , 1 ) = -1 ;
endif
endfor</code></pre>What this essentially does (for the long side) is ensure that price is higher at the end of y_values than at the beginning <i>and</i> there is a reward/risk opportunity of at least 3:1 for at least 1 trade during the period covered by the time range of y_values (either the London a.m. session or the combined New York a.m./London p.m. session) following a 7a.m. to 8.50a.m. (local time) formation of an opening Market profile/initial balance <i>and</i> the maximum adverse excursion occurs before the maximum favourable excursion. A typical chart on the long side looks like this.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYVXqU0xrw9i0w_XkcKiHr5HoX0wv39A_IbYrf4KZyaRh9ItxGLz6hURs72lcW4JDNoJAkdV9kM8ZCEtVMd6Mb5xkoPQLy8Wi6VSHU3-2dCOWqrpzt7D4XJZqZ9Ot6tJpewuJAOeh8owOr/s1576/long.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="871" data-original-width="1576" height="177" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYVXqU0xrw9i0w_XkcKiHr5HoX0wv39A_IbYrf4KZyaRh9ItxGLz6hURs72lcW4JDNoJAkdV9kM8ZCEtVMd6Mb5xkoPQLy8Wi6VSHU3-2dCOWqrpzt7D4XJZqZ9Ot6tJpewuJAOeh8owOr/s320/long.png" width="320" /></a></div>This would have the "weak" label for a long trade, and the label would be applied to the Market Profile data that immediately precedes this price action. On the other side, a short labelled chart typically looks like this.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoutUIU6GGqOC4c05Ms-5HuP3OEYFoyIs8Q73_CgR5uyC0Wn0w9-n6cW67PX4UWIMqgRMCj_p4sMJrSYRvKRROoyMXjo_GHQp5lieDslIg51tzHk_gmt_GMVoeRuxBOlWZGIJoPl2wluuR/s1571/short.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="850" data-original-width="1571" height="173" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoutUIU6GGqOC4c05Ms-5HuP3OEYFoyIs8Q73_CgR5uyC0Wn0w9-n6cW67PX4UWIMqgRMCj_p4sMJrSYRvKRROoyMXjo_GHQp5lieDslIg51tzHk_gmt_GMVoeRuxBOlWZGIJoPl2wluuR/s320/short.png" width="320" /></a></div>As can be seen, trading "against the label" offers few opportunities for profitable entries/exits. My hope is that a "dictionary" of long/short biased Market Profile patterns can be discovered using the ideas/code in the links above. For completeness, the following chart is typical of price action which does not meet the looped code bias for either long or short.<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgreEc_BB7GU4vvpP7f7D5hQsMKSqNBLR2-aIudKNcABg1Xubq6pWmGBuL33TlO9IqaeHCj9x9oXNGxKd7L4RniboE-COgcmUYoyL4WHgrdjDSFWQ-VkgCSjkNiFTDzfWQn_l0H95nAcvxx/s1565/flat.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="859" data-original-width="1565" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgreEc_BB7GU4vvpP7f7D5hQsMKSqNBLR2-aIudKNcABg1Xubq6pWmGBuL33TlO9IqaeHCj9x9oXNGxKd7L4RniboE-COgcmUYoyL4WHgrdjDSFWQ-VkgCSjkNiFTDzfWQn_l0H95nAcvxx/s320/flat.png" width="320" /></a></div><p>It is easy to envisage trading this type of price action by fading moves that go outside the "value area" of a Market Profile chart.</p><p>More in due course.<br /><br /><br /></p><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-57004961310252385662021-08-27T23:45:00.000+02:002021-08-27T23:45:07.402+02:00Another Iterative Improvement of my Volume/Market Profile Charts<p>Below is a screenshot of this new chart version, of today's (Friday's) price action at a 10 minute bar scale:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS6WxLmai3mYGm5R-vcJE2BdF7_NiPGVGe0Rw4BpRqyXsPVHMlwezz-cukUjmD_Pv_DdJxtwh-fpKo2XphM2BL09tyZmTv1nvHjm_oDYfBbH_7_A74aPYw1YiDljm2Y_yI7TerVFAvzjxF/s1920/new_chart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1920" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS6WxLmai3mYGm5R-vcJE2BdF7_NiPGVGe0Rw4BpRqyXsPVHMlwezz-cukUjmD_Pv_DdJxtwh-fpKo2XphM2BL09tyZmTv1nvHjm_oDYfBbH_7_A74aPYw1YiDljm2Y_yI7TerVFAvzjxF/s320/new_chart.png" width="320" /></a></div>Just by looking at the chart it might not be obvious to readers what has changed, so the changes are detailed below.<p></p><p>The first change is in how the volume profile (the horizontal histogram on the left) is calculated. The "old" version of the chart calculates the profile by assuming the "model" that tick volume for each 10 minute bar is normally distributed across the high/low range of the bar, and then the profile histogram is the accumulation of these individual, 10 minute, normally distributed "mini profiles." A more complete description of this is given in my <a href="https://dekalogblog.blogspot.com/2020/05/market-profile-chart-in-octave.html" target="_blank">Market Profile Chart in Octave</a> blog post, with code.</p><p>The new approach is more data centric rather than model based. Every 10 minutes, instead of downloading the 10 minute OHLC and tick volume, the last 10 minutes worth of 5 second OHLC and tick volume is downloaded. The whole tick volume of each 5 second period is assigned to a price level equivalent to the <a href="https://en.wikipedia.org/wiki/Typical_price" target="_blank">Typical price</a> (rounded to the nearest pip) of said 5 second period, and the volume profile is then the accumulation of these volume ticks per price level. I think this is a much more accurate reflection of the price levels at which tick volume actually occurred compared to the old, model based charts. This second screenshot is of the old chart over the exact same price data as the first, improved version of the chart.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQiL4VpgMZa7Qk06DWKaT2Wi6LW8WpVMSjbfDvaHogpV4r-o4Y67eRhk45GEN5lOrYl5AAerUu7MHoxdrsMmc6LIOv5_dhf0M4Y0s56yMoqfogFcqTbBPIGv1k9gZrhGVeRuARi3xiteId/s1920/old_chart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1920" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQiL4VpgMZa7Qk06DWKaT2Wi6LW8WpVMSjbfDvaHogpV4r-o4Y67eRhk45GEN5lOrYl5AAerUu7MHoxdrsMmc6LIOv5_dhf0M4Y0s56yMoqfogFcqTbBPIGv1k9gZrhGVeRuARi3xiteId/s320/old_chart.png" width="320" /></a></div>It can be seen that the two volume profile histograms of the respective charts differ from each other in terms of their overall shape and the number and price levels of peaks (Points of Control) and troughs (<a href="https://dekalogblog.blogspot.com/2021/07/market-profile-low-volume-node-chart.html" target="_blank">Low Volume Nodes</a>).<p></p><p>The second change in the new chart is in how the background heatmap is plotted. The heatmap is a different presentation of the volume profile whereby higher volume price levels are shown by the brighter yellow colours. The old chart only displays the heatmap associated with the latest calculated volume profile histogram, which is projected back in time. This is, of course, a form of lookahead bias when plotting past prices over the latest heatmap. The new chart solves this by plotting a "rolling" version of the heatmap which reflects the volume profile that was in force at the time each 10 minute OHLC candle formed. It is easy to see how the Points of Control and Low Volume Nodes price levels ebb and flow throughout the trading day.</p><p>The third change, which naturally followed on from the downloading of 5 second data, is in the plotting of the candlesticks. Rather than having a normal, open to close candlestick body, the candlesticks show the "mini volume profiles" of the tick volume within each bar, plotted via <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave's</a> <a href="https://octave.sourceforge.io/octave/function/patch.html" target="_blank">patch function</a>. The white candlestick wicks indicate the usual high/low range, and the open and close levels are shown by grey and black dots respectively. This is more clearly seen in the zoomed in screenshot below.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhghCt3mcTLuC4WUE1nZTr0PNukufRth5qaFfi-CiUjgCC5MLeFVwh7w0fEl6WKQAGpsdMna6z2PNoAuZVgHuF2G93e-aqMXu9Ms6YROucoe2PtdihSRw-21Di-rd1c6iYdIUBSHlNhpzn9/s1920/zoomed_chart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1920" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhghCt3mcTLuC4WUE1nZTr0PNukufRth5qaFfi-CiUjgCC5MLeFVwh7w0fEl6WKQAGpsdMna6z2PNoAuZVgHuF2G93e-aqMXu9Ms6YROucoe2PtdihSRw-21Di-rd1c6iYdIUBSHlNhpzn9/s320/zoomed_chart.png" width="320" /></a></div>I wanted to plot these types of bars because recently I have watched some trading webcasts, which talked about "P", "b" and "D" shaped bar profiles at "areas of interest." The upshot of these webcasts is that, in general, a "P" bar is bullish, a "b" is bearish and a "D" is "in balance" when they intersect an "area of interest" such as Point of Control, Low Volume Node, support and resistance etc. This is supposed to be indicative of future price direction over the immediate short term. With this new version of chart, I shall be in a position to investigate these claims for myself.<br /><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-85354441396765700702021-07-05T21:56:00.001+02:002021-07-05T21:56:17.177+02:00Market Profile Low Volume Node Chart<p>As a diversion to my recent work with <a href="https://www.cs.ucr.edu/~eamonn/MatrixProfile.html" target="_blank">Matrix Profile</a> I have recently completed work on a new chart type in <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a>, namely a <a href="https://en.wikipedia.org/wiki/Market_profile" target="_blank">Market Profile</a> <u>L</u>ow <u>V</u>olume <u>N</u>ode (<a href="https://www.youtube.com/results?search_query=market+profile+low+volume+node" target="_blank">LVN</a>) chart, two slightly different versions of which are shown below.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4v2gNAt7b_w7H0rWj7H3jabEGsRa4051fTxxoCxEYjLtPH3yFcREXhDqBxlQ3RK-nS4ruv_H-Ww-FpvOY1J2gF9vM-4WPTbQHkz99CvYWXAO9hkHjFi26vzI2R4fM9ai9poRE92tzcF8l/s1920/tpo_lvn.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1920" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4v2gNAt7b_w7H0rWj7H3jabEGsRa4051fTxxoCxEYjLtPH3yFcREXhDqBxlQ3RK-nS4ruv_H-Ww-FpvOY1J2gF9vM-4WPTbQHkz99CvYWXAO9hkHjFi26vzI2R4fM9ai9poRE92tzcF8l/s320/tpo_lvn.png" width="320" /></a></div>This first one is derived from a <a href="https://dekalogblog.blogspot.com/2020/05/a-comparison-of-charts.html" target="_blank">TPO chart</a>, whilst the next<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM3MebOnwx0aFy1U2CyLmG4hKDyGLfVjMfOEbX2qaEBg15L9dPN3dJCjF_Dy5AQwuJ0Etxlg7dI7oItamFEmF_Ke99MvWQIyV3DWwcXw1wouzmtjB5gJ23A6hEvBcaBReopXnDUesbO1xt/s1920/vp_lvn.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1920" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM3MebOnwx0aFy1U2CyLmG4hKDyGLfVjMfOEbX2qaEBg15L9dPN3dJCjF_Dy5AQwuJ0Etxlg7dI7oItamFEmF_Ke99MvWQIyV3DWwcXw1wouzmtjB5gJ23A6hEvBcaBReopXnDUesbO1xt/s320/vp_lvn.png" width="320" /></a></div>is derived from a <a href="https://dekalogblog.blogspot.com/2020/05/a-volume-profile-with-levels-chart.html" target="_blank">Volume profile chart</a>.<p></p><p>The horizontal lines are drawn at levels which are considered to be "lows" in the underlying, but not shown, TPO/Volume profiles. The yellow lines are "stronger lows" than the green lines, and the blue lines are extensions of the previous day's "strong lows" in force at the end of that day's trading.<br /></p><p>The point of all this, according to online guru theory, is that price is expected to be "rejected" at LVNs by either bouncing, a la support or resistance, or by price powering through the LVN level, usually on increased volume. The charts show the rolling development of the LVNs as the underlying profiles change throughout the day, hence lines can appear and disappear and change colour. As this is a new avenue of investigation for me I feel it is too soon to make a comment on these lines' efficacy, but it does seem uncanny how price very often seems to react to these levels.</p><p>More in due course.<br /><br /></p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5560052943772419320.post-58109594387437236732021-05-26T12:17:00.003+02:002021-06-16T12:02:34.407+02:00Update on Recent Matrix Profile Work<div><p>Since my previous post, on <a href="https://matrixprofile.org/" target="_blank">Matrix Profile (MP),</a> I have been doing a lot of online reading about MP and going back to various source papers and code that are available at the <a href="https://www.cs.ucr.edu/~eamonn/MatrixProfile.html" target="_blank">UCR Matrix Profile page</a>. I have been doing this because, despite my initial enthusiasm, the <a href="https://www.r-project.org/" target="_blank">R</a> <a href="https://cran.r-project.org/web/packages/tsmp/index.html" target="_blank">tsmp package</a> didn't turn out to be suitable for what I wanted to do, or perhaps more correctly I couldn't hack it to get the sort of results I wanted, hence my need to go to "first principles" and code from the UCR page.</p><p>Readers may recall that my motivation was to look for <a href="https://www.cs.ucr.edu/~eamonn/PAN_SKIMP%20%28Matrix%20Profile%20XX%29.pdf" target="_blank">time series motifs</a> that form "initial balance (IB)" set ups of <a href="https://en.wikipedia.org/wiki/Market_profile" target="_blank">Market Profile</a> charts. The rationale for this is that different IBs are precursors to specific market tendencies which may provide a clue or an edge in subsequent market action. A typical scenario from the literature on Market Profile might be "an Open Test Drive can often indicate one of the day's extremes." If this is <i>actually</i> true, one could go long/short with a high confidence stop at the identified extreme. Below is a screenshot of some typical IB profiles:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaSbK9rFhPxry2OwLGBInJsTD_2HUhBZLay9Ir0SVRqTMEVA_tNdW0JmdPkhmGp2Ia6gCM8HNpOKHAM991ibptgXV6nDNjxDCrgiVU24wRVRkgEh9Drt40_aJFuq5BdnAMHhD8bBt4or0I/s820/IB.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="681" data-original-width="820" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaSbK9rFhPxry2OwLGBInJsTD_2HUhBZLay9Ir0SVRqTMEVA_tNdW0JmdPkhmGp2Ia6gCM8HNpOKHAM991ibptgXV6nDNjxDCrgiVU24wRVRkgEh9Drt40_aJFuq5BdnAMHhD8bBt4or0I/s320/IB.png" width="320" /></a></div><p>where each letter typically represents a 30 minute period of market action. The problem is that Market Profile charts, to me at least, are inherently visual and therefore do not easily lend themselves to an algorithmic treatment, which makes it difficult to back test in a robust fashion. This is why I have been trying to use MP.</p><p>The first challenge I faced was how to preprocess price action data such as OHLC and volume such that I could use MP. In the end I resorted to using the mid-price, the high-low range and (tick) volume as proxies for market direction, market volatility and market participation. Because IBs occur over market opens, I felt it was important to use the volatility and participation proxies as these are important markers for the sentiment of subsequent price action. This choice necessitated using a multivariate form of MP, and I used the basic MP STAMP code that is available at <a href="https://sites.google.com/view/mstamp/home" target="_blank">Matrix Profile VI: Meaningful Multidimensional Motif Discovery</a>, with some slight tweaks for my use case.</p><p>Having the above tools in hand, what should they be used for? I decided that <a href="https://en.wikipedia.org/wiki/Cluster_analysis" target="_blank">Cluster analysis</a> is what is needed, i.e. cluster using the motifs that MP could discover. For this purpose, I used the approach outlined in section 3.9 of the paper <a href="https://escholarship.org/content/qt6rw5v40f/qt6rw5v40f.pdf?t=q9i3by" target="_blank">"The Swiss Army Knife of Time Series Data Mining."</a> The reasoning behind this choice is that if, for example, an "Open Test Drive IB" is a real thing, it should occur frequently enough that time series sub-sequences of it can be clustered or associated with an "Open Test Drive IB" motif. If all such prototype motifs can be identified and all IBs can be assigned to one of them, subsequent price action can be investigated to check the anecdotal claims, such as quoted above.</p><p>My <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> code implementation of the linked Swiss Army Knife routine is shown in the code box below.</p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>data = dlmread( '/path/to/mv_data' ) ;
skip_loc = dlmread( '/path/to/skip_loc' ) ;
skip_loc_copy = find( skip_loc ) ; skip_loc_copy2 = skip_loc_copy ; skip_loc_copy3 = skip_loc_copy ;
sub_len = 9 ;
data_len = size( data , 1 ) ;
data_to_use = [ (data(:,2).+data(:,3))./2 , data(:,2).-data(:,3) , data(:,5) ] ;
must_dim = [] ;
exc_dim = [] ;
[ pro_mul , pro_idx , data_freq , data_mu , data_sig ] = multivariate_stamp( data_to_use, sub_len, must_dim, exc_dim, skip_loc ) ;
original_single_MP = pro_mul( : , 1 ) ; ## just mid price
original_single_MP2 = original_single_MP .+ pro_mul( : , 2 ) ; ## mid price and hi-lo range
original_single_MP3 = original_single_MP2 .+ pro_mul( : , 3 ) ; ## mid price, hi-lo range and volume
## Swiss Army Knife Clustering
RelMP = original_single_MP ; RelMP2 = original_single_MP2 ; RelMP3 = original_single_MP3 ;
DissMP = inf( length( RelMP ) , 1 ) ; DissMP2 = DissMP ; DissMP3 = DissMP ;
minValStore = [] ; minIdxStore = [] ; minValStore2 = [] ; minIdxStore2 = [] ; minValStore3 = [] ; minIdxStore3 = [] ;
## set up a recording matrix
all_dist_pro = zeros( size( RelMP , 1 ) , size( data_to_use , 2 ) ) ;
for ii = 1 : 500
## reset recording matrix for this ii loop
all_dist_pro( : , : ) = 0 ;
## just mid price
[ minVal , minIdx ] = min( RelMP ) ;
minValStore = [ minValStore ; minVal ] ; minIdxStore = [ minIdxStore ; minIdx ] ;
DissmissRange = data_to_use( minIdx : minIdx + sub_len - 1 , : ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,1), DissmissRange(:,1), data_len, sub_len, data_mu(:,1), data_sig(:,1), data_mu(minIdx,1), data_sig(minIdx,1) ) ;
all_dist_pro( : , 1 ) = real( dist_pro ) ;
JMP = all_dist_pro( : , 1 ) ;
DissMP = min( DissMP , JMP ) ; ## dismiss all motifs discovered so far
RelMP = original_single_MP ./ DissMP ;
skip_loc_copy = unique( [ skip_loc_copy ; ( minIdx : 1 : minIdx + sub_len - 1 )' ] ) ;
RelMP( skip_loc_copy ) = 1 ;
## mid price and hi-lo range
[ minVal , minIdx ] = min( RelMP2 ) ;
minValStore2 = [ minValStore2 ; minVal ] ; minIdxStore2 = [ minIdxStore2 ; minIdx ] ;
DissmissRange = data_to_use( minIdx : minIdx + sub_len - 1 , : ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,1), DissmissRange(:,1), data_len, sub_len, data_mu(:,1), data_sig(:,1), data_mu(minIdx,1), data_sig(minIdx,1) ) ;
all_dist_pro( : , 2 ) = real( dist_pro ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,2), DissmissRange(:,2), data_len, sub_len, data_mu(:,2), data_sig(:,2), data_mu(minIdx,2), data_sig(minIdx,2) ) ;
all_dist_pro( : , 2 ) = all_dist_pro( : , 2 ) .+ real( dist_pro ) ;
JMP2 = all_dist_pro( : , 2 ) ;
DissMP2 = min( DissMP2 , JMP2 ) ; ## dismiss all motifs discovered so far
RelMP2 = original_single_MP2 ./ DissMP2 ;
skip_loc_copy2 = unique( [ skip_loc_copy2 ; ( minIdx : 1 : minIdx + sub_len - 1 )' ] ) ;
RelMP2( skip_loc_copy2 ) = 1 ;
## mid price, hi-lo range and volume
[ minVal , minIdx ] = min( RelMP3 ) ;
minValStore3 = [ minValStore3 ; minVal ] ; minIdxStore3 = [ minIdxStore3 ; minIdx ] ;
DissmissRange = data_to_use( minIdx : minIdx + sub_len - 1 , : ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,1), DissmissRange(:,1), data_len, sub_len, data_mu(:,1), data_sig(:,1), data_mu(minIdx,1), data_sig(minIdx,1) ) ;
all_dist_pro( : , 3 ) = real( dist_pro ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,2), DissmissRange(:,2), data_len, sub_len, data_mu(:,2), data_sig(:,2), data_mu(minIdx,2), data_sig(minIdx,2) ) ;
all_dist_pro( : , 3 ) = all_dist_pro( : , 3 ) .+ real( dist_pro ) ;
[ dist_pro , ~ ] = multivariate_mass (data_freq(:,3), DissmissRange(:,3), data_len, sub_len, data_mu(:,3), data_sig(:,3), data_mu(minIdx,3), data_sig(minIdx,3) ) ;
all_dist_pro( : , 3 ) = all_dist_pro( : , 3 ) .+ real( dist_pro ) ;
JMP3 = all_dist_pro( : , 3 ) ;
DissMP3 = min( DissMP3 , JMP3 ) ; ## dismiss all motifs discovered so far
RelMP3 = original_single_MP3 ./ DissMP3 ;
skip_loc_copy3 = unique( [ skip_loc_copy3 ; ( minIdx : 1 : minIdx + sub_len - 1 )' ] ) ;
RelMP3( skip_loc_copy3 ) = 1 ;
endfor ## end ii loop</code></pre><p></p><p>There are a few things to note about this code:</p><ul style="text-align: left;"><li>the use of a skip_loc vector </li><li>a sub_len value of 9 <br /></li><li>3 different calculations for DissMP and RelMP vectors</li></ul></div><p style="text-align: left;">i) The skip_loc vector is a vector of time series indices (Idx) for which the MP and possible cluster motifs should not be calculated to avoid identifying motifs from data sequences that do not occur in the underlying data due to the way I concatenated it during pre-processing, i.e. 7am to 9am, 7am to 9am, ... etc.</p><p style="text-align: left;">ii) sub_len value of 9 means 9 x 10 minute OHLC bars, to match the 30 minute A, B and C of the above IB screenshot.</p><p style="text-align: left;">iii) 3 different calculations because different combinations of the underlying data are used. </p><p>This last part probably needs more explanation. A multivariate RelMP is created by adding together individual dist_pros (distance profiles), and the cluster motif identification is achieved by finding minimums in the RelMP; however, a minimum in a multivariate RelMP is generally a different minimum to the minimums of the individual, univariate RelMPs. What my code does is use a univariate RelMP of the mid price, and 2 multivariate RelMPs of mid price plus high-low range and mid price, high-low range and volume. This gives 3 sets of minValues and minValueIdxs, one for each set of data. The idea is to run the ii loop for, e.g. 500 iterations, and to then identify possible "robust" IB cluster motifs by using the Octave <a href="https://octave.sourceforge.io/octave/function/intersect.html" target="_blank">intersect</a> function to get the minIdx that are common to all 3 sets of Idx data. </p><p>By way of example, setting the ii loop iteration to just 100 results in only one intersect Idx value on some EUR_USD forex data, the plot of which is shown below:</p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibJPy1WhpMQ0XFhkcT1MqtC1PeEYSFsKOm6xtn858sOyHywqUvfTMhva4PNed7sdYixCgar6LSjl9rPbmQ_I7Od7dJDAnws7UpSjC9C4PqMPcatr0ZBDchSEiNZ5cMedHcmrF7GCUsp1sa/s1572/candle.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="878" data-original-width="1572" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibJPy1WhpMQ0XFhkcT1MqtC1PeEYSFsKOm6xtn858sOyHywqUvfTMhva4PNed7sdYixCgar6LSjl9rPbmQ_I7Od7dJDAnws7UpSjC9C4PqMPcatr0ZBDchSEiNZ5cMedHcmrF7GCUsp1sa/s320/candle.png" width="320" /></a></div><p>Comparing this with the IB screenshot above, I would say this represents a typical "Open Auction" process with prices rotating upwards/downwards with no real conviction either way, with a possible long breakout on the last bar or alternatively, a last upwards test before a price plunge.</p><p>My intent is to use the above methodology to get a set of candidate IB motifs upon which a clustering algorithm can be based. This clustering algorithm will be the subject of my next post.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-70287569077377125522021-03-26T21:16:00.001+01:002021-03-26T21:52:33.985+01:00Market/Volume Profile and Matrix Profile<p>A quick preview of what I am currently working on: using <a href="https://matrixprofile.org/" target="_blank">Matrix Profile</a> to search for <a href="https://www.cs.ucr.edu/~eamonn/PAN_SKIMP%20%28Matrix%20Profile%20XX%29.pdf" target="_blank">time series motifs</a>, using the <a href="https://www.r-project.org/" target="_blank">R</a> <a href="https://cran.r-project.org/web/packages/tsmp/index.html" target="_blank">tsmp package</a>. The exact motifs I'm looking for are the various "initial balance" set ups of <a href="https://en.wikipedia.org/wiki/Market_profile" target="_blank">Market Profile</a> charts. </p><p>To do so, I'm concentrating the investigation around both the London and New York opening times, with a <a href="https://www.cs.ucr.edu/~eamonn/guided-motif-KDD17-new-format-10-pages-v005.pdf" target="_blank">custom annotation vector (av)</a>. Below is a simple R function to set up this custom av, which is produced separately in <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> and then loaded into R.<br /></p><pre style="border-style: solid; border-width: 2px; height: 120px; overflow: auto; width: 500px;"><code>mp_adjusted_by_custom_av <- function( mp_object , custom_av ){<br /> ## https://stackoverflow.com/questions/66726578/custom-annotation-vector-with-tsmp-r-package<br /> mp_object$av <- custom_av<br /> class( mp_object ) <- tsmp:::update_class( class( mp_object ) , "AnnotationVector" )<br /> mp_adjusted_by_custom_av <- tsmp::av_apply( mp_object )<br /> return( mp_adjusted_by_custom_av )<br />}<br /></-></code></pre>This animated GIF shows plots of short, exemplar adjusted market profile objects highlighting the London only, New York only and combined results of the relevant annotation vectors.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWJ43MbXaqSFasPJ9gd632slg5zdCtO6eOeFXjFdta4s_yjuk_LGKAjsxpQvRMwonmQ7ft6Pgp1OL_E8ao9Sl27LJzI_I0fo7W8Ngl0UCMFcqBU3F1fHGgun1I5fgcDrzQpeSGSs4gLudP/s866/animation.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="409" data-original-width="866" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWJ43MbXaqSFasPJ9gd632slg5zdCtO6eOeFXjFdta4s_yjuk_LGKAjsxpQvRMwonmQ7ft6Pgp1OL_E8ao9Sl27LJzI_I0fo7W8Ngl0UCMFcqBU3F1fHGgun1I5fgcDrzQpeSGSs4gLudP/s320/animation.gif" width="320" /></a></div>This is currently a work in progress and so I shall report results in due course.<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-78654927576922882172021-02-05T20:50:00.000+01:002021-02-05T20:50:30.439+01:00A Forex Pair Snapshot Chart<p>After yesterday's <a href="https://dekalogblog.blogspot.com/2021/02/heatmap-plot-of-forex-temporal.html" target="_blank">Heatmap Plot of Forex Temporal Clustering</a> post I thought I would consolidate all the chart types I have recently created into one easy, snapshot overview type of chart. Below is a typical example of such a chart, this being today's 10 minute EUR_USD forex pair chart up to a few hours after the London session close (the red vertical line).</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizyIRj5UHNYRsZB4YX-xJEKVyQ3H85iIRXQl-wOWE-fWpbMHrP9ZqiSWcB9_-rh59wzgtofYWffInhJgqr6PuWKadWAHNMnuKbYL7LU1CxRnOi9zEbUhL0YlXPlls40zFOK85cvEIn4eFM/s1881/eur_usd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="949" data-original-width="1881" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizyIRj5UHNYRsZB4YX-xJEKVyQ3H85iIRXQl-wOWE-fWpbMHrP9ZqiSWcB9_-rh59wzgtofYWffInhJgqr6PuWKadWAHNMnuKbYL7LU1CxRnOi9zEbUhL0YlXPlls40zFOK85cvEIn4eFM/s320/eur_usd.png" width="320" /></a></div><br />The top left chart is a <a href="https://dekalogblog.blogspot.com/search?q=volume+profile" target="_blank">Market/Volume Profile Chart</a> with added rolling Value Area upper and lower bounds (the cyan, red and white lines) and also rolling <a href="https://en.wikipedia.org/wiki/Volume-weighted_average_price" target="_blank">Volume Weighted Average Price</a> with upper and lower standard deviation lines (magenta).<p></p><p>The bottom left chart is the turning point heatmap chart as described in yesterday's post.</p><p>The two rightmost charts are also Market/Volume Profile charts, but of my <a href="https://dekalogblog.blogspot.com/2020/07/currency-strength-candlestick-chart.html" target="_blank">Currency Strength Candlestick Charts</a> based on my <a href="https://dekalogblog.blogspot.com/search?q=currency+strength+indicator" target="_blank">Currency Strength Indicator</a>. The upper one is the base currency, i.e. EUR, and the lower is the quote currency. </p><p>The following charts are the same day's charts for: <br /></p><p>GBP_USD,</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQZzSYLn0AMYU3jT_oig14HSThscVsOtGMsPiEyViN40p82YfwP2Zs9lpxXF3Sz3TZ3Aaedk6KwrQDK3fPnlIU49042WvkfIndKlFHyBb4-FW3RQCfOvrriTy73gdX2L6Ad75U2c1DpYgs/s1883/gbp_usd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="959" data-original-width="1883" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQZzSYLn0AMYU3jT_oig14HSThscVsOtGMsPiEyViN40p82YfwP2Zs9lpxXF3Sz3TZ3Aaedk6KwrQDK3fPnlIU49042WvkfIndKlFHyBb4-FW3RQCfOvrriTy73gdX2L6Ad75U2c1DpYgs/s320/gbp_usd.png" width="320" /></a></div>USD_CHF<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirLZTd2E66XeXCtI7TOieeE-8qQrS-M0nCKr0ceP2Hr018PFwd3S_xln4C1ssthx9wmZpHhKL0RySIoqWE82go58BDhT58cUPzJL30sVbhnvX927BmLMzMdGQNySNFfHGaxoOJA9l3jgmM/s1876/usd_chf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="958" data-original-width="1876" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirLZTd2E66XeXCtI7TOieeE-8qQrS-M0nCKr0ceP2Hr018PFwd3S_xln4C1ssthx9wmZpHhKL0RySIoqWE82go58BDhT58cUPzJL30sVbhnvX927BmLMzMdGQNySNFfHGaxoOJA9l3jgmM/s320/usd_chf.png" width="320" /></a></div>and finally USD_JPY<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg431YvfEseSpQ85a3r9ycOxqnUS2txNby1pcrYjssmVBO9X35F938rcc5ALhJ1pApkHPcfVSH02EQkMldLpSzbhzJBHxbrt5JyuMxPFTXWooNRnOdkrPZv_u6SA_s3vBwL38o_DDUkaEtr/s1886/usd_jpy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="963" data-original-width="1886" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg431YvfEseSpQ85a3r9ycOxqnUS2txNby1pcrYjssmVBO9X35F938rcc5ALhJ1pApkHPcfVSH02EQkMldLpSzbhzJBHxbrt5JyuMxPFTXWooNRnOdkrPZv_u6SA_s3vBwL38o_DDUkaEtr/s320/usd_jpy.png" width="320" /></a></div>The regularity of the turning points is easily seen in the lower lefthand charts although, of course, this is to be expected as they all share the USD as a common currency. However, there are also subtle differences to be seen in the "shadows" of the lighter areas.<p></p><p>For the nearest future my self-assigned task will be to observe the forex pairs, in real time, through the prism of the above style of chart and do some mental paper trading, and perhaps some really small size, discretionary live trading, in additional to my normal routine of research and development.<br /><br /><br /></p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5560052943772419320.post-6747499193221187172021-02-04T23:58:00.000+01:002021-02-04T23:58:01.637+01:00Heatmap Plot of Forex Temporal Clustering of Turning Points<p>Following up on my <a href="https://dekalogblog.blogspot.com/2021/01/temporal-clustering-times-on-forex.html" target="_blank">previous post</a>, below is the chart of the temporal turning points that I have come up with.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf5Hl1UKFzwVIvkNkG-jLxv1DVT5pl_0sH8_eY-jKwb7GOib-gXfDGEe2Xm7WkOdZkBygeAaorELSAckUI2BUHAU1nElVIaVMraAgz22Pq1_2I0xyyzog-eRjz1qqcD3oUnZs0WIwl3zpO/s1889/tp_heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="968" data-original-width="1889" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf5Hl1UKFzwVIvkNkG-jLxv1DVT5pl_0sH8_eY-jKwb7GOib-gXfDGEe2Xm7WkOdZkBygeAaorELSAckUI2BUHAU1nElVIaVMraAgz22Pq1_2I0xyyzog-eRjz1qqcD3oUnZs0WIwl3zpO/s320/tp_heatmap.png" width="320" /></a></div>This particular example happens to be 10 minute candlesticks over the last two days of the GBP_USD forex pair.<p></p><p>The details I have given about various turning points over the course of my last few posts have been based on identifying the "ix" centre value of turning point clusters. However, for plotting purposes I felt that just displaying these ix values wouldn't be very illuminating. Instead, I have taken the approach of displaying a sort of distribution of turning points per cluster. I would refer readers to my <a href="https://dekalogblog.blogspot.com/2020/11/temporal-clustering-part-3.html" target="_blank">temporal clustering part 3</a> post wherein there is a coloured histogram of the <a href="https://www.r-project.org/" target="_blank">R</a> output of the clustering algorithm used. What I have done for the heatmap background of the above chart is normalise each separate, coloured histogram by the maximum value within the cluster and then plotted these normalised cluster values using <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave's</a> <a href="https://octave.sourceforge.io/octave/function/pcolor.html" target="_blank">pcolor</a> function. An extra step taken was to raise the values to the power four just to increase the contrast within and between the sequential histogram backgrounds.</p><p>Each normalised histogram has a single value of one, which is shown by the bright yellow vertical lines, one per cluster. This represents the time of day at which, within the cluster window, the greatest number of turns occured in the historical lookback period. The darker green lines show other times within the cluster at which other turns occured.</p><p>The hypothesis behind this is that there are certain times of the day when price is more likely to change direction, a turning point, than at other times. Such times are market opens, closes etc. and the above chart is a convenient visual representation of these times. The lighter the backgound, the greater the probability that such a turn will occur, based upon the historical record of such turn timings.<br /></p><p>Enjoy! <br /> <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-69724210983465926172021-01-30T21:03:00.001+01:002021-01-30T21:03:55.335+01:00Temporal Clustering Times on Forex Majors Pairs<p>In the following code box there are the results from the temporal clustering routine of my last few posts on the four forex majors pairs of EUR_USD, GBP_USD, USD_CHF and USD_JPY.<br /></p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>###### EUR_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
## Delta turning point filter only ##
## "Normal" turning point filter only
###################### Monday ##############################################
K_opt == 8, ix values == 13 38 63 89 112 135 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive
00 4:10 8:20 12:40 16:30 20:20 00:50 4:50
K_opt == 8, ix values == 13 39 64 89 112 135 161 186 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 5, ix_values == 21 60 97 134 175 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K == 6, ix values == 21 59 94 125 158 184
K_opt == 11, ix values == 9 26 43 60 78 95 113 132 151 169 185 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 8, ix values == 13 36 61 86 111 136 161 186 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 8, ix values == 13 34 61 87 110 137 164 187 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 8, ix values == 13 38 63 88 112 137 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 10, ix values == 10 31 52 72 91 112 131 150 169 188 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 8, ix values == 12 35 62 88 112 137 164 187 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Tuesday #############################################
K_opt == 6, ix values == 131 169 206 244 283 322 ## averaged over all 15 n_bars 1 to 15 inclusive
19:40 02:00 8:10 14:30 21:00 03:30
K_opt == 6, ix values == 131 170 207 245 284 323 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 7, ix values == 131 168 206 243 274 305 330 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 11, ix values == 124 143 164 184 205 226 247 268 289 310 331 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 11, ix values == 124 144 164 185 204 225 246 267 288 309 332 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 7, ix values = 133 169 206 241 273 304 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 127 152 175 202 228 253 278 305 330 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 9, ix values == 127 152 177 202 228 253 278 304 329 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 7, ix values == 132 168 205 242 273 304 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Wednesday ###########################################
K_opt == 6, ix values == 275 312 351 389 426 465 ## averaged over all 15 n_bars 1 to 15 inclusive
19:40 01:50 08:20 14:40 20:50 03:20
K_opt == 6, ix values == 275 313 352 391 428 466 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 6, ix values == 274 312 350 389 424 463 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 272 299 322 347 372 397 422 449 474 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 11, ix values == 268 288 308 329 348 369 390 411 432 453 476 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 6, ix values == 275 312 351 388 424 463 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 272 297 322 348 373 398 423 449 474 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 9, ix values == 271 297 322 348 373 398 423 448 473 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 6, ix values == 276 311 350 389 426 465 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
####################### Thursday ###########################################
K_opt == 6, ix values == 420 457 495 532 570 609 ## averaged over all 15 n_bars 1 to 15 inclusive
19:50 02:00 08:20 14:30 20:50 03:20
K_opt == 6, ix values == 420 457 494 531 570 610 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 6, ix values == 420 457 495 532 568 607 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 416 443 466 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 10, ix values == 414 437 460 483 506 527 550 573 596 619 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 9, ix values == 416 443 466 493 520 543 568 595 618 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 415 440 465 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 9, ix values == 415 440 465 492 518 543 568 593 618 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 7, ix values == 420 457 494 529 561 592 617 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
####################### Friday #############################################
K_opt == 5, ix values == 564 599 635 670 703 ## averaged over all 15 n_bars 1 to 15 inclusive
19:50 01:40 07:40 13:30 19:00
K_opt == 6, ix values == 563 596 627 654 680 707 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K == 5, ix values == 564 599 635 668 703
K_opt == 5, ix values == 564 601 639 674 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 556 575 595 614 633 652 672 691 711 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 11, ix values == 554 570 587 602 619 634 651 667 682 698 713 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 9, ix values == 556 575 595 614 633 652 671 691 711 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 9, ix values == 556 575 596 613 634 652 672 691 711 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K_opt == 9, ix values == 556 575 594 613 633 652 672 691 710 ## averaged over all 15 n_bars 1 to 15 inclusive
K_opt == 9, ix values == 556 575 594 613 634 653 672 691 710 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt == 5, ix values == 564 600 637 674 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
############################################################################
###### GBP_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
###################### Monday ##############################################
K_opt = 8, ix_values = 13 36 61 86 111 136 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive
0:00 3:50 8:00 12:10 16:20 20:30 0:50 4:50
K_opt = 9, ix_values = 12 34 56 78 99 120 141 164 187 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 8, ix_values = 12 35 61 86 110 136 163 186 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Tuesday #############################################
K_opt = 12, ix_values = 124 143 162 180 199 216 235 254 274 293 312 332 ## averaged over all 15 n_bars 1 to 15 inclusive
18:30 21:40 0:50 3:50 7:00 9:50 13:00 16:10 19:30 22:40 1:50 5:10
K_opt = 11, ix_values = 124 143 164 185 206 227 248 269 290 311 332 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 9, ix_values = 128 154 177 205 230 254 279 307 330 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Wednesday ###########################################
K_opt = 11, ix_values = 269 290 311 331 352 373 394 415 434 455 476 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 22:10 1:40 5:00 8:30 12:00 15:30 19:00 22:10 1:40 5:10
K_opt = 11, ix_values = 269 289 310 330 351 372 393 413 434 455 476 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 8, ix_values = 275 310 341 367 394 422 451 475 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Thursday ############################################
K_opt = 9, ix_values = 415 440 465 492 517 542 568 594 618 ## averaged over all 15 n_bars 1 to 15 inclusive
19:00 23:10 3:20 7:50 12:00 16:10 20:30 0:50 4:50
K_opt = 9, ix_values = 415 440 465 491 517 542 568 593 618 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 9, ix_values = 416 441 464 492 519 542 569 596 619 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Friday ##############################################
K_opt = 9, ix_values = 557 576 595 614 633 652 671 690 711 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:00 4:10 7:20 10:30 13:40 16:50 20:20
K_opt = 9, ix_values = 557 576 595 614 633 652 671 691 711 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 8, ix_values = 557 576 599 621 642 665 686 709 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
############################################################################
###### USD_CHF 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
###################### Monday ##############################################
K_opt = 11, ix_values = 8 25 42 61 79 96 113 131 150 169 188 ## averaged over all 15 n_bars 1 to 15 inclusive
23:10 2:00 4:50 8:00 11:00 13:50 16:40 19:40 22:50 2:00 5:10
K_opt = 11, ix_values = 9 26 43 60 79 96 114 133 151 170 189 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 13 38 66 99 127 157 184 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Tuesday #############################################
K_opt = 9, ix_values = 127 152 177 202 228 253 279 306 330 ## averaged over all 15 n_bars 1 to 15 inclusive
19:00 23:10 3:20 7:30 11:50 16:00 20:20 0:50 4:50
K_opt = 11, ix_values = 124 144 165 185 204 225 246 267 288 309 331 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 133 170 205 240 270 301 328 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Wednesday ###########################################
K_opt = 10, ix_values = 270 293 316 342 365 388 411 432 454 475 ## averaged over all 15 n_bars 1 to 15 inclusive
18:50 22:40 2:30 6:50 10:40 14:30 18:20 21:50 1:30 5:00
K_opt = 12, ix_values = 268 287 308 327 346 365 384 401 420 439 458 477 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 276 313 349 383 414 444 471 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Thursday ############################################
K_opt = 11, ix_values = 413 432 452 471 491 512 533 554 575 598 619 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:10 4:20 7:40 11:10 14:40 18:10 21:40 1:30 5:00
K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 563 582 601 621 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 9, ix_values = 415 440 463 491 518 543 570 597 619 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Friday ##############################################
K_opt = 9, ix_values = 557 576 596 615 634 653 672 691 710 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:10 4:20 7:30 10:40 13:50 17:00 20:10
K_opt = 9, ix_values = 556 575 595 614 633 652 671 690 710 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 558 579 602 629 652 677 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
############################################################################
###### USD_JPY 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
###################### Monday ##############################################
K_opt = 12, ix_values = 8 24 41 58 73 90 107 124 141 158 173 190 ## averaged over all 15 n_bars 1 to 15 inclusive
23:10 1:50 4:40 7:30 10:00 12:50 15:40 18:30 21:20 0:10 2:40 5:30
K_opt = 12, ix_values = 8 24 41 56 73 90 107 124 141 158 173 190 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 5, ix_values = 20 60 99 136 175 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Tuesday #############################################
K_opt = 9, ix_values = 128 154 179 204 229 254 279 306 331 ## averaged over all 15 n_bars 1 to 15 inclusive
19:10 23:30 3:40 7:50 12:00 16:10 20:20 0:50 5:00
K_opt = 9, ix_values = 128 153 178 203 228 254 279 305 330 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 133 168 205 240 271 302 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Wednesday ###########################################
K_opt = 11, ix_values = 269 289 310 331 352 373 394 414 433 454 476 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 22:00 1:30 5:00 8:30 12:00 15:30 18:50 22:00 1:30 5:10
K_opt = 9, ix_values = 272 297 322 348 374 399 424 449 474 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 10, ix_values = 269 288 309 331 352 376 398 423 450 475 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Thursday ############################################
K_opt = 9, ix_values = 416 442 467 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive
19:10 23:30 3:40 7:50 12:10 16:20 20:30 0:40 4:50
K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 564 583 602 621 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 7, ix_values = 420 455 492 527 560 591 618 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
###################### Friday ##############################################
K_opt = 7, 8 or 9
ix_values 7 = 561 588 613 638 663 686 709 ## averaged over all 15 n_bars 1 to 15 inclusive
ix_values 8 = 557 578 599 622 643 666 687 710
ix_values 9 = 557 576 596 616 635 653 672 691 711 ## timings are for this bottom row
18:40 21:50 1:10 4:30 7:40 10:40 13:50 17:00 20:20
K_opt = 8, ix_values = 558 579 600 621 644 665 687 709 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K_opt = 6, ix_values = 563 594 621 646 676 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
############################################################################</code></pre><p></p><p>This is based on 10 minute bars over the last year or so. Readers should read my last few previous posts for background.</p><p>The first set of results, EUR_USD, are what the charts of my previous posts were based on and include combined results of my "Delta Turning Point Filter" and "Normal Turning Point Filter" and the results for each filter separately. Since there doesn't appear to be significant differences between these, the other three pairs' results are the combined filter results only.</p><p>The K_opt variable is the optimal number of clusters (see my <a href="https://dekalogblog.blogspot.com/2020/11/temporal-clustering-part-3.html" target="_blank">temporal-clustering-part-3</a> post for how "optimal" is decided) and the ix_values are also described in this post. For convenience the first set of ix_values per day have the relevant times anotated underneath and therefore it is a simple matter to count forwards/backwards in 10 minute increments to place times to the other ix_values. The variable n_bars is an input to the turning point filter functions and essentially indicates the lookback/lookforward period (n_bar == 2 would mean 2 x 10 minute periods) used for determining a local high/low according to each function's logic.</p><p>As to how to interpret this, a typical sequence of times per day might look like this:<br /></p><p>18:40 22:00 1:30 5:00 <u><i><b>8:30 12:00 15:30 18:50 22:00</b></i></u> 1:30 5:10</p><p>where the highlighted times represent the BST times for the period covering the London session open to the New York session close for one day. The preceding and following times are the two "book-ending" Asian sessions. </p><p>Close inspection of these results reveals some surprising regularities. In even just the above single example (an actual copy and paste of a code box example) there appear to be definite times per day at which a local high/low occurs. I hopefully will be able to incorporate this into some type of chart for a nice visual presentation of the data. </p><p>More in due course. Enjoy. <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-17344881986821622762020-11-29T18:57:00.000+01:002020-11-29T18:57:05.182+01:00Temporal Clustering on Real Prices, Part 2<p>Below are some more out of sample plots for the Temporal Clustering solutions of the EUR_USD forex pair for the week just gone. The details of how these solutions are derived is explained in my previous post, <a href="https://dekalogblog.blogspot.com/2020/11/temporal-clustering-on-real-prices.html" target="_blank">Temporal Clustering on Real Prices</a>. First is Tuesday's solution</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjb40wy1e8cKUCnHi8w3MjEdEkm-PJkHDpYujMTzNaCCUnT4zVw0FhYHbwwQoikdBminw0WZk5lt4liFLA12Vy_ygO597HZShKaOYdfAQzLMjqSsArqWV9etlmPTRrRcNFwlqX3qLA046iy/s1579/eur_usd_tuesday_all.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="887" data-original-width="1579" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjb40wy1e8cKUCnHi8w3MjEdEkm-PJkHDpYujMTzNaCCUnT4zVw0FhYHbwwQoikdBminw0WZk5lt4liFLA12Vy_ygO597HZShKaOYdfAQzLMjqSsArqWV9etlmPTRrRcNFwlqX3qLA046iy/s320/eur_usd_tuesday_all.png" width="320" /></a></div>where the major (blue vertical lines) turns are a combination of optimal K values of 6 and 7 (5 sets of data in total) plus 2 sets of data each for K = 9 and 11 (red and green vertical lines). The price plot is<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHrl1xmpTJiNilz1x9pPuCUX_PlEi_WdKc5Sk4FbxiuMaCwBOn981tqMgt_5RCHpmefHKBdJju7sXuElUkMv4bwrD_bEGuDDsnBKUWU_FbcHMifL0ovibpQTR9LKzgawEVkPE8cS0w2M-9/s1554/tuesday_prices.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="883" data-original-width="1554" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHrl1xmpTJiNilz1x9pPuCUX_PlEi_WdKc5Sk4FbxiuMaCwBOn981tqMgt_5RCHpmefHKBdJju7sXuElUkMv4bwrD_bEGuDDsnBKUWU_FbcHMifL0ovibpQTR9LKzgawEVkPE8cS0w2M-9/s320/tuesday_prices.png" width="320" /></a></div>Next up is Wednesday's solution<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj_aWZkurKPgSIEdkE41WZoeFpEV5j6r9eXV5dk6FsteOJHkxWQRBujitfYDHyK-kHdvjyt3yTaV5Tr8ByaIIxYa_IA1eRNrnCA7d8HxUZcDEM2uzZZJAjxQr2ODHmOzIzzskyCOHCktFD/s1534/eur_usd_wednesday_all.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="869" data-original-width="1534" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj_aWZkurKPgSIEdkE41WZoeFpEV5j6r9eXV5dk6FsteOJHkxWQRBujitfYDHyK-kHdvjyt3yTaV5Tr8ByaIIxYa_IA1eRNrnCA7d8HxUZcDEM2uzZZJAjxQr2ODHmOzIzzskyCOHCktFD/s320/eur_usd_wednesday_all.png" width="320" /></a></div>where the blue vertical lines represent 5 sets of data with K = 6 and the red and green vertical lines 3 sets and 1 set with K = 9 and K= 11 respectively. The price plot is<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnn69FZxpHgayQVJlwzRGFEfShG02SUdqJ70yFXQ_rXLbC6foJUuFFEicv8cbuBc8bao0gra8-QCvNhlM4Vn6ll9hUqmCVgwQHqUT7vzsaAs_6S6f47m3qB-XBs578rOqs9qCcbAFZ1xe6/s1569/wednesday_prices.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="888" data-original-width="1569" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnn69FZxpHgayQVJlwzRGFEfShG02SUdqJ70yFXQ_rXLbC6foJUuFFEicv8cbuBc8bao0gra8-QCvNhlM4Vn6ll9hUqmCVgwQHqUT7vzsaAs_6S6f47m3qB-XBs578rOqs9qCcbAFZ1xe6/s320/wednesday_prices.png" width="320" /></a></div>Thursday's solution is<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0-vDPNyagvk8DQene4vEiHz3XHJ9VozYufvmWTF7-0LYeXsdi-w1NtvhODvw0JsyzSCwG4lHXXW3VGmQwP-_nmY26uTxhJuCesV3jk8SUK2AAm7xz6byh75xn4aMTboVh1-jr8n4ZcQRh/s1571/eur_usd_thursday_all.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="870" data-original-width="1571" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0-vDPNyagvk8DQene4vEiHz3XHJ9VozYufvmWTF7-0LYeXsdi-w1NtvhODvw0JsyzSCwG4lHXXW3VGmQwP-_nmY26uTxhJuCesV3jk8SUK2AAm7xz6byh75xn4aMTboVh1-jr8n4ZcQRh/s320/eur_usd_thursday_all.png" width="320" /></a></div>where black/blue vertical lines are K values of 9 and 6 respectively, whilst green/red are K values 10 and 7. Thursday's price plot is<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0SF25mFdQqTkLeZT4sh4eiViJ37vUYELMLXw4I_yxXp_YRuXbY1TX-3SeL6FGJxpStyYOeiYFDrPMgvgPNfm9GcZzK_-GsFkhyphenhyphengdDNA7sWA2KwF48OQc6tTx3rSy1pkDb04SpMNVdZZNI/s1535/thursday_prices.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="866" data-original-width="1535" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0SF25mFdQqTkLeZT4sh4eiViJ37vUYELMLXw4I_yxXp_YRuXbY1TX-3SeL6FGJxpStyYOeiYFDrPMgvgPNfm9GcZzK_-GsFkhyphenhyphengdDNA7sWA2KwF48OQc6tTx3rSy1pkDb04SpMNVdZZNI/s320/thursday_prices.png" width="320" /></a></div>Finally, Friday's solution is<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ16pnnhBntxnGEuAAN-kyKIp8llyzvN8kDgybkebhnaFD0aCKlGVQ21J9LFKAVasl2ZWocsPYHAQ28tnjcqnt78SnadcZAp5faKMUpaQuYHFenuL9nhvoNVj4X0LpzVMbbskHXCnPl_H2/s1539/eur_usd_friday_all.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="878" data-original-width="1539" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ16pnnhBntxnGEuAAN-kyKIp8llyzvN8kDgybkebhnaFD0aCKlGVQ21J9LFKAVasl2ZWocsPYHAQ28tnjcqnt78SnadcZAp5faKMUpaQuYHFenuL9nhvoNVj4X0LpzVMbbskHXCnPl_H2/s320/eur_usd_friday_all.png" width="320" /></a></div>where the major blue vertical lines are K = 9 over 5 sets of data, with the remainder being K = 5, 6 and 11 over the last 4 sets of data. Friday's price plot is<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghdiLqGm2L-uzUZLx1pPy7jZiCzWYwy3Za1bPZTKBMB2uAYK1t47Wh4YvZvmxPHcjDPgPuOdsqXrs7wrCYoi1uffWlIgB4MvUi6NPcE7pN_uzW0o0DFBOoN1PqGTjDtqQMITdbjL8RaLL1/s1567/friday_prices.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="871" data-original-width="1567" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghdiLqGm2L-uzUZLx1pPy7jZiCzWYwy3Za1bPZTKBMB2uAYK1t47Wh4YvZvmxPHcjDPgPuOdsqXrs7wrCYoi1uffWlIgB4MvUi6NPcE7pN_uzW0o0DFBOoN1PqGTjDtqQMITdbjL8RaLL1/s320/friday_prices.png" width="320" /></a></div><p>The above seems to tie in nicely with my previous post about <a href="https://dekalogblog.blogspot.com/2020/07/forex-intraday-seasonality.html" target="_blank">Forex Intraday Seasonality</a> whereby the above identified turning points signify the end points of said intraday tendencies to trend. Readers might also be interested in another paper I have come across, <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=960209" target="_blank">Segmentation and Time-of-Day Patterns in Foreign Exchange Markets</a>, which gives a possible, theoretical explanation as to why such patterns manifest themselves. In particular, for the EUR_USD pair, the paper states <i> </i></p><ul style="text-align: left;"><li><i>"the US dollar appreciates significantly from 8:00 to 12:00 GMT<br />and the euro appreciates significantly from 16:00 to 22:00 GMT"</i><br /></li></ul><p>Readers can judge for themselves whether this appears to be true, out of sample, by inspecting the above plots. Enjoy!</p><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-57136734633279813952020-11-24T11:47:00.000+01:002020-11-24T11:47:18.470+01:00Temporal Clustering on Real Prices<p>Having now had time to run the code shown in my previous post, <a href="https://dekalogblog.blogspot.com/2020/11/temporal-clustering-part-3.html" target="_blank">Temporal Clustering, part 3</a>, in this post I want to show the results on real prices.</p><p>Firstly, I have written two functions in <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> to identify market turning points and each function takes as input an n_bar argument which determines the lookback/lookforward length along price series to determine local relative highs and lows. I ran both these for n_bar values of 1 to 15 inclusive on EUR_USD forex 10 minute bars from July 2012 upto and including last week's set of 10 minute bars. I created 3 sets of turning point data per function by averaging the function outputs over n_bar 1 - 15, 1 - 6 and 7 - 15, and also averaged the outputs over the average of the 2 functions over the same ranges. In total this gives 9 slightly different sets of turning point data.</p><p>I then ran the optimal K clustering code, shown in previous posts, over each set of data to get the "solutions" per set of data. Six of the sets had an optimal K value of 8 and a combined plot of these is shown below.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5z6EL_SIXFLcWQefVSVmfiM-eJcw5Oq_q0LOr2S9d1DQ28bHsL3CM6EAyoq_EmkZI4_rvhUCdxVCLSOnO-8MivqaU74XtytgTTr4PEJONIS9VVKqFk8MJdiR6_rrQZdZKknPxR0NHd187/s1556/mon_tp.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="883" data-original-width="1556" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5z6EL_SIXFLcWQefVSVmfiM-eJcw5Oq_q0LOr2S9d1DQ28bHsL3CM6EAyoq_EmkZI4_rvhUCdxVCLSOnO-8MivqaU74XtytgTTr4PEJONIS9VVKqFk8MJdiR6_rrQZdZKknPxR0NHd187/s320/mon_tp.png" width="320" /></a></div>For each "solution" turning point ix (ix ranges from 1 to 198) a turning point value of 1 is added to get a sort of spike train plot through time. The ix = 1 value is 22:00 BST on Sunday and ix = 198 is 06:50 BST on Tuesday. I chose this range so that there would be a buffer at each end of the time range I am really interested in: 7:00 BST to 22:00 BST, which covers the time from the London open to the New York close. The vertical blue lines are plotted for clarity to help identify the the turns and are plotted as 3 consecutive lines 10 minutes apart. The added text shows the time of occurence of the first bar of each triplet of lines, the time being London BST. The following second plot is the same as above but with the other 3 "solutions" of K = 5, 10 and 11 added.<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4v46p63b3M-g9rZ2XGnBIirYEx-DGri70yDh1TV0pIvDQMcdoEI5uo42c_ir3FTkoaYWI2fSQ7HBBd8MkE4fOZ4l0qz7qbi2TYDkCqo36EHsagl9GOjyM3PpcpMY_O9xfdObobid8SjMm/s1545/mon_2_tp.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="877" data-original-width="1545" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4v46p63b3M-g9rZ2XGnBIirYEx-DGri70yDh1TV0pIvDQMcdoEI5uo42c_ir3FTkoaYWI2fSQ7HBBd8MkE4fOZ4l0qz7qbi2TYDkCqo36EHsagl9GOjyM3PpcpMY_O9xfdObobid8SjMm/s320/mon_2_tp.png" width="320" /></a></div>For those readers who are familiar with <a href="https://dekalogblog.blogspot.com/search?q=delta+phenomenon&max-results=20&by-date=false" target="_blank">the Delta Phenomenon</a> the main vertical blue lines could conceptually be thought of as MTD lines with the other lines being lower timeframe ITD lines, but on an intraday scale. However, it is important to bear in mind that this is NOT a Delta solution and therefore rules about numbering, alternating highs and lows and inversions etc. do not apply. It is more helpful to think in terms of probability and see the various spikes/lines as indicating times of the day at which there is a higher probability of price making a local high or low. The size of a move after such a high or low is not indicated, and the timings are only approximate or alternatively represent the centre of a window in which the high or low might occur.<p></p><p>The proof of the pudding is in the eating, however, and the following plots are yesterday's (23 November 2020) out of sample EUR_USD forex pair price action with the lines of the above "solution" overlaid. The first plot is just the K = 8 solution plot</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjh13wffZr8oeRNUataugmvp9QtW5dMlIim6XXn4ZZY6ZHt8nrl0yNE-Ii1FMzH2ejWMM3ahyphenhyphenSjsakQxRZK1g8v0SqNNkS09owoO08t4EASs6LSh08vUpZPhD6YB09jYEALQ8F-Dqmninj_/s1560/price_tp.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="874" data-original-width="1560" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjh13wffZr8oeRNUataugmvp9QtW5dMlIim6XXn4ZZY6ZHt8nrl0yNE-Ii1FMzH2ejWMM3ahyphenhyphenSjsakQxRZK1g8v0SqNNkS09owoO08t4EASs6LSh08vUpZPhD6YB09jYEALQ8F-Dqmninj_/s320/price_tp.png" width="320" /></a></div>whilst this second plot has all lines shown.<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkG_LRFEOrk7ZHw-JUuVMyzp6eCPhMh8tc2Ufrxdx28VR6jvZVF1Gy-QrinfVVJ-3OumRtpU1QhaDqtp4QdWXaaBfzhrptWrHTmiIxzfKPhPvopZGQuUAvYj_1WR1sxU2gZE2YbzJWIGKF/s1574/price_tp_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="874" data-original-width="1574" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkG_LRFEOrk7ZHw-JUuVMyzp6eCPhMh8tc2Ufrxdx28VR6jvZVF1Gy-QrinfVVJ-3OumRtpU1QhaDqtp4QdWXaaBfzhrptWrHTmiIxzfKPhPvopZGQuUAvYj_1WR1sxU2gZE2YbzJWIGKF/s320/price_tp_1.png" width="320" /></a></div>Given the above caveats about caution with regards to the lines only being probabilities, it seems uncanny how accurately the major highs and lows of the day are picked out. I only wish I had done this analysis sooner as then yesterday could have been one of my best trading days ever!<p></p><p>More soon.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-85656260592210392092020-11-14T16:45:00.000+01:002020-11-14T16:45:29.944+01:00Temporal Clustering, Part 3<p>Continuing on with the subject matter of my <a href="https://dekalogblog.blogspot.com/2020/11/a-temporal-clustering-function-part-2.html" target="_blank">last post, </a>in the code box below there is <a href="https://www.r-project.org/" target="_blank">R</a> code which is a straight forward refactoring of the <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> code contained in the second code box of my last post. This code is my implementation of the <a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)" target="_blank">cross validation</a> routine described in the paper <a href="https://statweb.stanford.edu/~gwalther/predictionstrength.pdf" target="_blank">Cluster Validation by Prediction Strength</a>, but adapted for use in the one dimensional case. I have refactored this into R code so that I can use the <a href="https://cran.r-project.org/web/packages/Ckmeans.1d.dp/index.html" target="_blank">Ckmeans.1d.dp</a> package for optimal, one dimensional clustering. <br /></p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>library( Ckmeans.1d.dp )
## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to//all_data_matrix" , header = FALSE )
## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; adjust = 1 ; sum_seq_l = as.numeric( length( sum_seq ) )## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; adjust = 2 ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday
## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )
upper_tri_mask = 1 * upper.tri( matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l ) , diag = FALSE )
no_sample_iters = 1000
max_K = 20
all_k_ps = matrix( 0L , nrow = 1 , ncol = max_K )
for ( iters in 1 : no_sample_iters ) {
## sample the data in data by rows
train_ix = sample( nrow( data ) , size = round( nrow( data ) / 2 ) , replace = FALSE )
train_data = data[ train_ix , sum_seq ] ## extract training data using train_ix rows of data
train_data_sum = colSums( train_data ) ## sum down the columns of train_data
test_data = data[ -train_ix , sum_seq ] ## extract test data using NOT train_ix rows of data
test_data_sum = colSums( test_data ) ## sum down the columns of test_data
## adjust for weekend if necessary
if ( adjust == 1 ) { ## Monday, so correct artifacts of weekend gap
train_data_sum[ 1 : 5 ] = mean( train_data_sum[ 1 : 48 ] )
test_data_sum[ 1 : 5 ] = mean( test_data_sum[ 1 : 48 ] )
} else if ( adjust == 2 ) { ## Friday, so correct artifacts of weekend gap
train_data_sum[ ( sum_seq_l - 4 ) : sum_seq_l ] = mean( train_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] )
test_data_sum[ ( sum_seq_l - 4 ) : sum_seq_l ] = mean( test_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] )
}
for ( k in 1 : max_K ) {
## K segment train_data_sum
train_res = Ckmeans.1d.dp( sum_seq , k , train_data_sum )
train_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )
## K segment test_data_sum
test_res = Ckmeans.1d.dp( sum_seq , k , test_data_sum )
test_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )
for ( ii in 1 : length( train_res$centers ) ) {
ix = which( train_res$cluster == ii )
train_out_pairs_mat[ ix , ix ] = 1
ix = which( test_res$cluster == ii )
test_out_pairs_mat[ ix , ix ] = 1
}
## coerce to upper triangular matrix
train_out_pairs_mat = train_out_pairs_mat * upper_tri_mask
test_out_pairs_mat = test_out_pairs_mat * upper_tri_mask
## get minimum co-membership cluster proportion
sample_min_vec = matrix( 0L , nrow = 1 , ncol = length( test_res$centers ) )
for ( ii in 1 : length( test_res$centers ) ) {
ix = which( test_res$cluster == ii )
test_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] )
train_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] * train_out_pairs_mat[ ix , ix ] )
sample_min_vec[ , ii ] = train_cluster_sum / test_cluster_sum
}
## get min of sample_min_vec
min_val = min( sample_min_vec[ !is.nan( sample_min_vec ) ] ) ## removing any NaN
all_k_ps[ , k ] = all_k_ps[ , k ] + min_val
} ## end of K for loop
} ## end of sample loop
all_k_ps = all_k_ps / no_sample_iters ## average values
plot( 1 : length( all_k_ps ) , all_k_ps , "b" , xlab = "Number of Clusters K" , ylab = "Prediction Strength Value" )
abline( h = 0.8 , col = "red" )</code></pre><p>The purpose of the cross validation routine is to select the number of clusters K, in the <a href="https://en.wikipedia.org/wiki/Model_selection" target="_blank">model selection</a> sense, that is best supported by the available data. The above linked paper suggests that the optimal number of clusters K is the highest number K that has a prediction strength value over some given threshold (e.g. 0.8 or 0.9). The last part of the code plots the values of prediction strength for K (x-axis) vs. prediction strength (y-axis), along with the threshold value of 0.8 in red. For the particular set of data in question, it can be seen that the optimal K value for the number of clusters is 8.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioHej8WOtevlZwZWL4br8b5cF75CILS2Xrn1pzG7MA0pDtaTN1GioM6xJr9TlX1zD5W36xrjRNezhVt1XD7xOlO2Iu0Yspft2xMXqO-LCxxKSZ_NTXa7OeuauJ2xK6MxiMy65P69Ay01SA/s1056/Rplot1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioHej8WOtevlZwZWL4br8b5cF75CILS2Xrn1pzG7MA0pDtaTN1GioM6xJr9TlX1zD5W36xrjRNezhVt1XD7xOlO2Iu0Yspft2xMXqO-LCxxKSZ_NTXa7OeuauJ2xK6MxiMy65P69Ay01SA/s320/Rplot1.png" width="320" /></a></div><p></p>This second code box shows code, re-using some of the above code, to visualise the clusters for a given K,<br /><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>library( Ckmeans.1d.dp )
## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to/all_data_matrix" , header = FALSE )
data_sum = colSums( data ) ## sum down the columns of data
data_sum[ 1 : 5 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap
data_sum[ 716 : 720 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap
## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday
## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )
k = 8
res = Ckmeans.1d.dp( sum_seq , k , data_sum[ sum_seq ] )
plot( sum_seq , data_sum[ sum_seq ], main = "Cluster centres. Cluster centre ix is a predicted turning point",
col = res$cluster,
pch = res$cluster, type = "h", xlab = "Count from beginning ix at ix = 1",
ylab = "Total Counts per ix" )
abline( v = res$centers, col = "chocolate" , lty = "dashed" )
text( res$centers, max(data_sum[sum_seq]) * 0.95, cex = 0.75, font = 2,
paste( round(res$centers) ) )</code></pre>a typical plot for which is shown below.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7BCguVC1wDxjwDKD20xGNsHFxl4YW5yGTjAZ3JkUhKzMbUy2YjwyDtzqwW08KRphP0WmUNADhQRvhA39tw1z2GC6I_eK72RR06xAypOSklw-ejcqMohYZJPTrvoObJBOaMjlwnKC3t8s/s1056/Rplot2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7BCguVC1wDxjwDKD20xGNsHFxl4YW5yGTjAZ3JkUhKzMbUy2YjwyDtzqwW08KRphP0WmUNADhQRvhA39tw1z2GC6I_eK72RR06xAypOSklw-ejcqMohYZJPTrvoObJBOaMjlwnKC3t8s/s320/Rplot2.png" width="320" /></a></div>The above plot can be thought of as a clustering at a particular scale, and one can go down in scale by selecting smaller ranges of the data. For example, taking all the datum clustered in the 3 clusters centred at x-axis ix values 38, 63 and 89 and re-running the code in the first code box on just this data gives this prediction strength plot, which suggests a K value of 6.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHCoFO7q3IUQYUCANWhJUk5p1x56PkmeG8zkR2eh0PekpYUoiB5REt67GPek-oeitdcVhgvCoXxK0V5TM89ra5Y181zHhJIWZwFxvmcfHlcA15Ea49FK8n2Ax8xZ6OXwYZO1Lgd2b2s_JM/s1056/Rplot3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHCoFO7q3IUQYUCANWhJUk5p1x56PkmeG8zkR2eh0PekpYUoiB5REt67GPek-oeitdcVhgvCoXxK0V5TM89ra5Y181zHhJIWZwFxvmcfHlcA15Ea49FK8n2Ax8xZ6OXwYZO1Lgd2b2s_JM/s320/Rplot3.png" width="320" /></a></div>Re-running the code in the second code box plots these 6 clusters thus.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgflFl7DBlYe6ns2lGkeCdjB5bBtTMo0FLcDeVGJvaWNzeiv8uoU7M9oXI4F9Xk1Efw4g-i5EMY5gBV5Xs-oLLqwSwT8LzFlGRMecp58qo6vP9IHlPG7xQrrSCl5mWJsOyeGVjn_RP_k3aK/s1056/Rplot4.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgflFl7DBlYe6ns2lGkeCdjB5bBtTMo0FLcDeVGJvaWNzeiv8uoU7M9oXI4F9Xk1Efw4g-i5EMY5gBV5Xs-oLLqwSwT8LzFlGRMecp58qo6vP9IHlPG7xQrrSCl5mWJsOyeGVjn_RP_k3aK/s320/Rplot4.png" width="320" /></a></div><p>Looking at this last plot, it can be seen that there is a cluster at x-axis ix value 58, which corresponds to 7.30 a.m. London time, and within this green cluster there are 2 distinct peaks which correspond to 7.00 a.m. and 8.00 a.m. A similar, visual analysis of the far right cluster, centre ix = 94, shows a peak at the time of the New York open.</p><p>My hypothesis is that by clustering in the above manner it will be possible to identify distinct, intraday times at which the probability of a market turn is greater than at other times. More in due course. <br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5560052943772419320.post-67809713394666669822020-11-09T15:17:00.001+01:002020-11-13T16:23:24.281+01:00A Temporal Clustering Function, Part 2<p>Further to my <a href="https://dekalogblog.blogspot.com/2020/10/a-temporal-clustering-function.html" target="_blank">previous post</a>, below is an extended version of the "blurred_maxshift_1d_linear" function. This updated version has two extra outputs: a vector of the cluster centre index ix values and a vector the same length as the input data with the cluster centres to which each datum has been assigned. These changes have necessitated some extensive re-writing of the function to include various checks contained in nested conditional statements. <br /></p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>## Copyright (C) 2020 dekalog
##
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program. If not, see
## <https: licenses="" www.gnu.org="">.
## -*- texinfo -*-
## @deftypefn {} {@var{train_vec}, @var{cluster_centre_ix}, @var{assigned_cluster_centre_ix} =} blurred_maxshift_1d_linear_V2 (@var{train_vec}, @var{bandwidth})
##
## @seealso{}
## @end deftypefn
## Author: dekalog <dekalog dekalog="">
## Created: 2020-10-21
function [ new_train_vec , cluster_centre_ix , assigned_cluster_centre_ix ] = blurred_maxshift_1d_linear_V2 ( train_vec , bandwidth )
if ( nargin < 2 )
bandwidth = 1 ;
endif
if ( numel( train_vec ) < 2 * bandwidth + 1 )
error( 'Bandwidth too wide for length of train_vec.' ) ;
endif
length_train_vec = numel( train_vec ) ;
new_train_vec = zeros( size( train_vec ) ) ;
assigned_cluster_centre_ix = ( 1 : 1 : length_train_vec ) ;
## initialising loop
## do the beginning
[ ~ , ix ] = max( train_vec( 1 : 2 * bandwidth + 1 ) ) ;
new_train_vec( ix ) = sum( train_vec( 1 : bandwidth + 1 ) ) ;
assigned_cluster_centre_ix( 1 : bandwidth + 1 ) = ix ;
## and end of train_vec first
[ ~ , ix ] = max( train_vec( end - 2 * bandwidth : end ) ) ;
new_train_vec( end - 2 * bandwidth - 1 + ix ) = sum( train_vec( end - bandwidth : end ) ) ;
assigned_cluster_centre_ix( end - bandwidth : end ) = length_train_vec - 2 * bandwidth - 1 + ix ;
for ii = ( bandwidth + 2 ) : ( length_train_vec - bandwidth - 1 )
[ ~ , ix ] = max( train_vec( ii - bandwidth : ii + bandwidth ) ) ;
new_train_vec( ii - bandwidth - 1 + ix ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii - bandwidth - 1 + ix ;
endfor
## end of initialising loop
train_vec = new_train_vec ;
## initialise the while condition variable
has_converged = 0 ;
while ( has_converged < 1 )
new_train_vec = zeros( size( train_vec ) ) ;
## do the beginning
[ ~ , ix ] = max( train_vec( 1 : 2 * bandwidth + 1 ) ) ;
new_train_vec( ix ) += sum( train_vec( 1 : bandwidth + 1 ) ) ;
assigned_cluster_centre_ix( 1 : bandwidth + 1 ) = ix ;
## and end of train_vec first
[ ~ , ix ] = max( train_vec( end - 2 * bandwidth : end ) ) ;
new_train_vec( end - 2 * bandwidth - 1 + ix ) += sum( train_vec( end - bandwidth : end ) ) ;
assigned_cluster_centre_ix( end - bandwidth : end ) = length_train_vec - 2 * bandwidth - 1 + ix ;
for ii = ( bandwidth + 2 ) : ( length_train_vec - bandwidth - 1 )
[ max_val , ix ] = max( train_vec( ii - bandwidth : ii + bandwidth ) ) ;
## check for ties in max_val value in window
no_ties = sum( train_vec( ii - bandwidth : ii + bandwidth ) == max_val ) ;
if ( no_ties == 1 && max_val == train_vec( ii ) && ix == bandwidth + 1 ) ## main if
## value in train_vec(ii) is max val of window, with no ties
new_train_vec( ii ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii ;
elseif ( no_ties == 1 && max_val != train_vec( ii ) && ix != bandwidth + 1 ) ## main if
## no ties for max_val, but need to move data at ii and change ix
## get assigned_cluster_centre_ix that point to ii, which needs to be updated
assigned_ix = find( assigned_cluster_centre_ix == ii ) ;
if ( !isempty( assigned_ix ) ) ## should always be true because at least the one original ii == ii
assigned_cluster_centre_ix( assigned_ix ) = ii - ( bandwidth + 1 ) + ix ;
elseif ( isempty( assigned_ix ) ) ## but cheap insurance
assigned_cluster_centre_ix( ii ) = ii - ( bandwidth + 1 ) + ix ;
endif
new_train_vec( ii - ( bandwidth + 1 ) + ix ) += train_vec( ii ) ;
elseif ( no_ties > 1 && max_val > train_vec( ii ) ) ## main if
## 2 ties for max_val, which is > val at ii, need to move data at ii
## to the closer max_val ix and change ix in assigned_cluster_centre_ix
match_max_val_ix = find( train_vec( ii - bandwidth : ii + bandwidth ) == max_val ) ;
if ( numel( match_max_val_ix ) == 2 ) ## only 2 matching max vals
centre_window_dist = ( bandwidth + 1 ) .- match_max_val_ix ;
if ( abs( centre_window_dist( 1 ) ) == abs( centre_window_dist( 2 ) ) )
## equally distant from centre ii of moving window
assigned_ix = find( assigned_cluster_centre_ix == ii ) ;
if ( !isempty( assigned_ix ) ) ## should always be true because at least the one original ii == ii
ix_before = find( assigned_ix < ii ) ;
ix_after = find( assigned_ix > ii ) ;
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( 1 ) ) += train_vec( ii ) / 2 ;
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( 2 ) ) += train_vec( ii ) / 2 ;
assigned_cluster_centre_ix( assigned_ix( ix_before ) ) = ii - ( bandwidth + 1 ) + match_max_val_ix( 1 ) ;
assigned_cluster_centre_ix( assigned_ix( ix_after ) ) = ii - ( bandwidth + 1 ) + match_max_val_ix( 2 ) ;
assigned_cluster_centre_ix( ii ) = ii ; ## bit of a kluge
elseif ( isempty( assigned_ix ) ) ## but cheap insurance
## no other assigned_cluster_centre_ix values to account for, so just split equally
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( 1 ) ) += train_vec( ii ) / 2 ;
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( 2 ) ) += train_vec( ii ) / 2 ;
assigned_cluster_centre_ix( ii ) = ii ; ## bit of a kluge
else
error( 'There is an unknown error in instance ==2 matching max_vals with equal distances to centre of moving window with assigned_ix. Write code to deal with this edge case.' ) ;
endif
else ## not equally distant from centre ii of moving window
assigned_ix = find( assigned_cluster_centre_ix == ii ) ;
if ( !isempty( assigned_ix ) ) ## should always be true because at least the one original ii == ii
## There is an instance == 2 matching max_vals with non equal distances to centre of moving window with previously assigned_ix to ii ix
## Assign all assigned_ix to the nearest max value ix
[ ~ , min_val_ix ] = min( [ abs( centre_window_dist( 1 ) ) abs( centre_window_dist( 2 ) ) ] ) ;
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( min_val_ix ) ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii - ( bandwidth + 1 ) + match_max_val_ix( min_val_ix ) ;
assigned_cluster_centre_ix( assigned_ix ) = ii - ( bandwidth + 1 ) + match_max_val_ix( min_val_ix ) ;
elseif ( isempty( assigned_ix ) ) ## but cheap insurance
[ ~ , min_val_ix ] = min( abs( centre_window_dist ) ) ;
new_train_vec( ii - ( bandwidth + 1 ) + match_max_val_ix( min_val_ix ) ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii - ( bandwidth + 1 ) + match_max_val_ix( min_val_ix ) ;
else
error( 'There is an unknown error in instance of ==2 matching max_vals with unequal distances. Write the code to deal with this edge case.' ) ;
endif
endif ##
elseif ( numel( match_max_val_ix ) > 2 ) ## There is an instance of >2 matching max_vals.
## There must be one max val closer than the others or two equally close
centre_window_dist = abs( ( bandwidth + 1 ) .- match_max_val_ix ) ;
centre_window_dist_min = min( centre_window_dist ) ;
centre_window_dist_min_ix = find( centre_window_dist == centre_window_dist_min ) ;
if ( numel( centre_window_dist_min_ix ) == 1 ) ## there is one closet ix
assigned_ix = find( assigned_cluster_centre_ix == ii ) ;
if ( !isempty( assigned_ix ) ) ## should always be true because at least the one original ii == ii
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii - ( bandwidth + 1 ) + centre_window_dist_min_ix ;
assigned_cluster_centre_ix( assigned_ix ) = ii - ( bandwidth + 1 ) + centre_window_dist_min_ix ;
elseif ( isempty( assigned_ix ) ) ## but cheap insurance
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix ) += train_vec( ii ) ;
assigned_cluster_centre_ix( ii ) = ii - ( bandwidth + 1 ) + centre_window_dist_min_ix ;
endif
elseif ( numel( centre_window_dist_min_ix ) == 2 ) ## there are 2 equally close ix
assigned_ix = find( assigned_cluster_centre_ix == ii ) ;
if ( !isempty( assigned_ix ) ) ## should always be true because at least the one original ii == ii
ix_before = find( assigned_ix < ii ) ;
ix_after = find( assigned_ix > ii ) ;
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 1 ) ) += train_vec( ii ) / 2 ;
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 2 ) ) += train_vec( ii ) / 2 ;
assigned_cluster_centre_ix( assigned_ix( ix_before ) ) = ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 1 ) ;
assigned_cluster_centre_ix( assigned_ix( ix_after ) ) = ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 2 ) ;
assigned_cluster_centre_ix( ii ) = ii ; ## bit of a kluge
elseif ( isempty( assigned_ix ) ) ## but cheap insurance
## no other assigned_cluster_centre_ix values to account for, so just split equally
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 1 ) ) += train_vec( ii ) / 2 ;
new_train_vec( ii - ( bandwidth + 1 ) + centre_window_dist_min_ix( 2 ) ) += train_vec( ii ) / 2 ;
assigned_cluster_centre_ix( ii ) = ii ; ## bit of a kluge
endif
else
error( 'Unknown error in numel( match_max_val_ix ) > 2.' ) ;
endif
##error( 'There is an instance of >2 matching max_vals. Write the code to deal with this edge case.' ) ;
else
error( 'There is an unknown error in instance of >2 matching max_vals. Write the code to deal with this edge case.' ) ;
endif
endif ## main if end
endfor
if ( sum( ( train_vec == new_train_vec ) ) == length_train_vec )
has_converged = 1 ;
else
train_vec = new_train_vec ;
endif
endwhile
cluster_centre_ix = unique( assigned_cluster_centre_ix ) ;
cluster_centre_ix( cluster_centre_ix == 0 ) = [] ;
endfunction
</dekalog></https:></code></pre><p>The reason for this re-write was to accommodate a <a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)" target="_blank">cross validation</a> routine, which is described in the paper <a href="https://statweb.stanford.edu/~gwalther/predictionstrength.pdf" target="_blank">Cluster Validation by Prediction Strength,</a> and a simple outline of which is given in this <a href="https://stats.stackexchange.com/questions/87098/can-you-compare-different-clustering-methods-on-a-dataset-with-no-ground-truth-b" target="_blank">stackexchange.com answer.</a></p><p>My <a href="https://www.gnu.org/software/octave/index" target="_blank">Octave</a> code implementation of this is shown in the code box below. This is not exactly as described in the above paper because the number of clusters, K, is not exactly specified due to the above function automatically determining K based on the data. The routine below is perhaps more accurately described as being inspired by the original paper. <br /></p><pre style="border-style: solid; border-width: 2px; height: 150px; overflow: auto; width: 500px;"><code>## create train and test data sets
########## UNCOMMENT AS NECESSARY #####
time_ix = [ 1 : 198 ] ; ## Monday
##time_ix = [ 115 : 342 ] ; ## Tuesday
##time_ix = [ 259 : 486 ] ; ## Wednesday
##time_ix = [ 403 : 630 ] ; ## Thursday
##time_ix = [ 547 : 720 ] ; ## Friday
##time_ix = [ 1 : 720 ] ; ## all data
#######################################
all_cv_solutions = zeros( size( data_matrix , 3 ) , size( data_matrix , 3 ) ) ;
n_iters = 1 ;
for iter = 1 : n_iters ## related to # of rand_ix sets generated
rand_ix = randperm( size( data_matrix , 1 ) ) ;
train_ix = rand_ix( 1 : round( numel( rand_ix ) * 0.5 ) ) ;
test_ix = rand_ix( round( numel( rand_ix ) * 0.5 ) + 1 : end ) ;
train_data_matrix = sum( data_matrix( train_ix , time_ix , : ) ) ;
test_data_matrix = sum( data_matrix( test_ix , time_ix , : ) ) ;
all_proportions_indicated = zeros( 1 , size( data_matrix , 3 ) ) ;
for cv_ix = 1 : size( data_matrix , 3 ) ; ## related to delta_turning_point_filter n_bar parameter
for bandwidth = 1 : size( data_matrix , 3 )
## train set clustering
if ( bandwidth == 1 )
[ train_out , cluster_centre_ix_train , assigned_cluster_centre_ix_train ] = blurred_maxshift_1d_linear_V2( train_data_matrix(:,:,cv_ix) , bandwidth ) ;
elseif( bandwidth > 1 )
[ train_out , cluster_centre_ix_train , assigned_cluster_centre_ix_train ] = blurred_maxshift_1d_linear_V2( train_out , bandwidth ) ;
endif
train_out_pairs_mat = zeros( numel( assigned_cluster_centre_ix_train ) , numel( assigned_cluster_centre_ix_train ) ) ;
for ii = 1 : numel( cluster_centre_ix_train )
cc_ix = find( assigned_cluster_centre_ix_train == cluster_centre_ix_train( ii ) ) ;
train_out_pairs_mat( cc_ix , cc_ix ) = 1 ;
endfor
train_out_pairs_mat = triu( train_out_pairs_mat , 1 ) ; ## get upper diagonal matrix
## test set clustering
if ( bandwidth == 1 )
[ test_out , cluster_centre_ix_test , assigned_cluster_centre_ix_test ] = blurred_maxshift_1d_linear_V2( test_data_matrix(:,:,cv_ix) , bandwidth ) ;
elseif( bandwidth > 1 )
[ test_out , cluster_centre_ix_test , assigned_cluster_centre_ix_test ] = blurred_maxshift_1d_linear_V2( test_out , bandwidth ) ;
endif
all_test_out_pairs_mat = zeros( numel( assigned_cluster_centre_ix_test ) , numel( assigned_cluster_centre_ix_test ) ) ;
test_out_pairs_clusters_proportions = ones( 1 , numel( cluster_centre_ix_test ) ) ;
for ii = 1 : numel( cluster_centre_ix_test )
cc_ix = find( assigned_cluster_centre_ix_test == cluster_centre_ix_test( ii ) ) ;
all_test_out_pairs_mat( cc_ix , cc_ix ) = 1 ;
test_out_pairs_mat = all_test_out_pairs_mat( cc_ix , cc_ix ) ;
test_out_pairs_mat = triu( test_out_pairs_mat , 1 ) ; ## get upper diagonal matrix
test_out_pairs_mat_sum = sum( sum( test_out_pairs_mat ) ) ;
if ( test_out_pairs_mat_sum > 0 )
test_out_pairs_clusters_proportions( ii ) = sum( sum( train_out_pairs_mat( cc_ix , cc_ix ) .* test_out_pairs_mat ) ) / ...
test_out_pairs_mat_sum ;
endif
endfor
all_proportions_indicated( bandwidth ) = min( test_out_pairs_clusters_proportions ) ;
all_cv_solutions( bandwidth , cv_ix ) += all_proportions_indicated( bandwidth ) ;
endfor ## bandwidth for loop
endfor ## end of cv_ix loop
endfor ## end of iter for
all_cv_solutions = all_cv_solutions ./ n_iters ;
surf( all_cv_solutions ) ; xlabel( 'BANDWIDTH' , 'fontsize' , 20 ) ; ylabel( 'CV IX' , 'fontsize' , 20 ) ;</code></pre>I won't discuss the workings of the code any further as readers are free to read the original paper and my code interpretation of it. A typical surface plot of the output is shown below.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwHX4Ny0__RGw7WazgYehk-MjHWxyKxepYb80Lt-0bE_3uaOKKihcPZssAKAve6bm2Ej37LBoLIuW3rm9JX5Y2cfCb1Lyd2NWao-wYq2qhlPaLz36JvMrIyElg0hu2AkQltRiyXVaKAna_/s1663/surf.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="809" data-original-width="1663" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwHX4Ny0__RGw7WazgYehk-MjHWxyKxepYb80Lt-0bE_3uaOKKihcPZssAKAve6bm2Ej37LBoLIuW3rm9JX5Y2cfCb1Lyd2NWao-wYq2qhlPaLz36JvMrIyElg0hu2AkQltRiyXVaKAna_/s320/surf.png" width="320" /></a></div><p>The "bandwidth" plotted along the front edge of the surface plot is one of the input parameters to the "blurred_maxshift_1d_linear" function, whilst the "lookback" is a parameter of the original data generating function which identifies local highs and lows in a price time series. There appears to be a distinct "elbow" at "lookback" = 6 which is more or less consistent for all values of "bandwidth." Since the underlying data for this is 10 minute <a href="https://en.wikipedia.org/wiki/Open-high-low-close_chart" target="_blank">OHLC bars</a>, the ideal "lookback" would, therefore, appear to be on the hourly timeframe.</p><p>However, having spent some considerable time and effort to get the above working satisfactorily, I am now not so sure that I'll actually use the above code. The reason for this is shown in the following animated GIF. <br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7lYGGW6bKvHJrVxhm3cLj5HbHi-6QsHhFZlNJl_4crlgF20nhGQ-3XXObDvcJCkXcwmH5nl-vWwqVXYRcXGpEurFETNQOgEEEJ-FyOvI5wT-jEn3T03l8f___IZ0IULbt33Fe3Cfpvs9N/s1056/output.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7lYGGW6bKvHJrVxhm3cLj5HbHi-6QsHhFZlNJl_4crlgF20nhGQ-3XXObDvcJCkXcwmH5nl-vWwqVXYRcXGpEurFETNQOgEEEJ-FyOvI5wT-jEn3T03l8f___IZ0IULbt33Fe3Cfpvs9N/s320/output.gif" width="320" /></a></div><p>This shows K segmentation of the exact same data used above, from K = 1 to 19 inclusive, using <a href="https://www.r-project.org/" target="_blank">R</a> and its <a href="https://cran.r-project.org/web/packages/Ckmeans.1d.dp/index.html" target="_blank">Ckmeans.1d.dp</a> package, with <a href="https://cran.r-project.org/web/packages/Ckmeans.1d.dp/vignettes/Ckmeans.1d.dp.html" target="_blank">vignette tutorial here</a>. I am particularly attracted to this because of its speed, compared to my code above, as well as its guarantees with regard to optimality and reproducability. If one stares at the GIF for long enough one can see possible, significant clusters at index values (x-axis) which correspond, approximately, to particularly significant times such as London and New York market opening and closing times: ix = 55, 85, 115 and 145.<br /></p><p>More about this in my next post. <br /></p>Unknownnoreply@blogger.com0