Thursday 31 October 2013

Exploratory Data Analysis of Brownian Motion Model

For my first steps in investigating the earlier proposed Brownian motion model I thought I would use some Exploratory Data Analysis techniques. I have initially decided to use look back periods of 5 bars and 21 bars; the 5 bar on daily data being a crude attempt to capture the essence of weekly price movement and the 21 bar to capture the majority of dominant cycle periods in daily data, with both parameters also being non optimised Fibonacci numbers. Using the two channels over price bars that these create it is possible to postulate nine separate and distinct price "regimes" using combinations of the criteria of being above, within or below each channel.

The use of these "regimes" is to bin normalised price change at time t into nine separate distributions, the normalising factor being the 5 bar simple moving average of absolute natural log differences at time t-1 and the choice of regime being determined by the position of the close at t-1. The rationale behind basing these metrics on preceding bars is explained in this earlier post and the implementation made clear in the code box at the end of this post. The following 4 charts are Box plots of these nine distributions, in consecutive order, for the EURUSD, GBPUSD, USDCHF and USDYEN  forex pairs:



Looking at these, I'm struck by two things. Firstly, the middle three box plots are for regimes where price is in the 21 bar channel, and numbering from the left, boxes 4, 5 and 6 are below the 5 bar channel, in it and above it. These are essentially different degrees of a sideways market and it can be seen that there are far more outliers here than in the other box plots, perhaps indicating the tendency for high volatility breakouts to occur more frequently in sideways markets.

The second thing that is striking is, again numbering from the left, box plots 3 and 7. These are regimes below the 21 bar and above the 5 bar; and above the 21 bar and below the 5 bar - essentially the distributions for reactions against prevailing trends on the 21 bar time scale. Here it can be seen that there are far fewer outliers and generally shorter whiskers, indicating the tendency for reactions against trends to be less volatile in nature.

It is encouraging to see that there are such differences in these box plots as it suggests that my idea of binning does separate out the different price change characteristics. However, I believe things can be improved a bit in this regard and this will form the subject matter of my next post.
clear all

data = load("-ascii","usdyen") ;
close = data(:,7) ;
logdiff = [ 0 ; diff( log(close) ) ] ;
abslogdiff = abs( logdiff ) ;

sq_rt = sqrt( 5 ) ;
sma5 = sma(abslogdiff,5) ;
ub5 = exp(shift(log(close),5).+(sma5.*sq_rt)) ;
lb5 = exp(shift(log(close),5).-(sma5.*sq_rt)) ;

sq_rt = sqrt( 21 ) ;
sma21 = sma(abslogdiff,21) ;
ub21 = exp(shift(log(close),21).+(sma21.*sq_rt)) ;
lb21 = exp(shift(log(close),21).-(sma21.*sq_rt)) ;

% allocate close bar to a market_type
market_type = zeros( size(close,1) , 1 ) ;

% above ub21 **********************
val_1 = close > ub21 ;
val_2 = close > ub5 ;
matches = val_1 .* val_2 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = 4 ;

val_1 = close > ub21 ;
val_2 = close <= ub5 ;
val_3 = close >= lb5 ;
matches = val_1 .* val_2 .* val_3 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = 3 ;

val_1 = close > ub21 ;
val_2 = close < lb5 ;
matches = val_1 .* val_2 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = 2 ;
%**********************************

% below lb21 **********************
val_1 = close < lb21 ;
val_2 = close < lb5 ;
matches = val_1 .* val_2 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = -4 ;

val_1 = close < lb21 ;
val_2 = close <= ub5 ;
val_3 = close >= lb5 ;
matches = val_1 .* val_2 .* val_3 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = -3 ;

val_1 = close < lb21 ;
val_2 = close > ub5 ;
matches = val_1 .* val_2 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = -2 ;
%**********************************

% between ub21 & lb21**************
val_1 = close <= ub21 ;
val_2 = close >= lb21 ;
val_3 = close > ub5 ;
matches = val_1 .* val_2 .* val_3 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = 1 ;

val_1 = close <= ub21 ;
val_2 = close >= lb21 ;
val_3 = close < lb5 ;
matches = val_1 .* val_2 .* val_3 ;
ix = find( matches == 1 ) ;
market_type( ix , 1 ) = -1 ;
%**********************************

% now allocate to bins
normalised_logdiffs = logdiff ./ shift( sma5 , 1 ) ;

% shift market_type to agree with normalised_logdiffs
shifted_market_type = shift( market_type , 1 ) ;

bin_4 = normalised_logdiffs( find( shifted_market_type(30:end,1) == 4 ) , 1 ) ;
bin_3 = normalised_logdiffs( find( shifted_market_type(30:end,1) == 3 ) , 1 ) ;
bin_2 = normalised_logdiffs( find( shifted_market_type(30:end,1) == 2 ) , 1 ) ;
bin_1 = normalised_logdiffs( find( shifted_market_type(30:end,1) == 1 ) , 1 ) ;
bin_0 = normalised_logdiffs( find( shifted_market_type(30:end,1) == 0 ) , 1 ) ;
bin_1m = normalised_logdiffs( find( shifted_market_type(30:end,1) == -1 ) , 1 ) ;
bin_2m = normalised_logdiffs( find( shifted_market_type(30:end,1) == -2 ) , 1 ) ;
bin_3m = normalised_logdiffs( find( shifted_market_type(30:end,1) == -3 ) , 1 ) ;
bin_4m = normalised_logdiffs( find( shifted_market_type(30:end,1) == -4 ) , 1 ) ;

y_axis_max = max( normalised_logdiffs(30:end,1) ) ;
y_axis_min = min( normalised_logdiffs(30:end,1) ) ;

clf ;
subplot(191) ; boxplot(bin_4m,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(192) ; boxplot(bin_3m,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(193) ; boxplot(bin_2m,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(194) ; boxplot(bin_1m,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(195) ; boxplot(bin_0,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(196) ; boxplot(bin_1,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(197) ; boxplot(bin_2,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(198) ; boxplot(bin_3,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;
subplot(199) ; boxplot(bin_4,NOTCHED=1) ; ylim([y_axis_min y_axis_max]) ;

Sunday 27 October 2013

Modelling Prices Using Brownian Motion.

As stated in my previous post I'm going to rewrite my data generation code for future use in online training of neural nets, and the approach I'm going to take is to combine the concept of Brownian motion and the ideas contained in my earlier Creation of Synthetic Data post.

The basic premise, taken from Brownian motion, is that the natural log of price changes, on average, at a rate proportional to the square root of time. Take, for example, a period of 5 leading up to the "current bar." If we take a 5 period simple moving average of the absolute differences of the log of prices over this period, we get a value for the average 1 bar price movement over this period. This value is then multiplied by the square root of 5 and added to and subtracted from the price 5 days ago to get an upper and lower bound for the current bar. If the current bar lies between the bounds, we say that price movement over the last 5 periods is consistent with Brownian motion and declare an absence of trend, i.e. a sideways market. If the current bar lies outside the bounds, we declare that price movement over the last 5 bars is not consistent with Brownian motion and that a trend is in force, either up or down depending on which bound the current bar is beyond. The following 3 charts show this concept in action, for consecutive periods of 5, 13 and 21, taken from the Fibonacci Sequence:


where yellow is the closing price, blue is the upper bound and red the lower bound. It is easy to imagine many uses for this in terms of indicator creation, but I intend to use the bounds to assign a score of price randomness/trendiness over various combined periods to assign price movement to bins for subsequent Monte Carlo creation of synthetic price series. Interested readers are invited to read the above linked Creation of Synthetic Data post for a review of this methodology.

The rough working Octave code that produced the above charts is given below.
clear all

data = load("-ascii","eurusd") ;
% length = input( 'Enter length of look back for calculations: ' ) ;
% sq_rt = sqrt( length ) ;
close = data(:,7) ;
abslogdiff = abs( [ 0 ; diff( log(close) ) ] ) ;

lngth = 3 ;
sq_rt = sqrt( lngth ) ;
sma3 = sma(abslogdiff,lngth) ;
ub = exp(shift(log(close),lngth).+(sma3.*sq_rt)) ;
lb = exp(shift(log(close),lngth).-(sma3.*sq_rt)) ;

all_ub = ub ;
all_lb = lb ;

lngth = 5 ;
sq_rt = sqrt( lngth ) ;
sma5 = sma(abslogdiff,5) ;
ub = exp(shift(log(close),lngth).+(sma5.*sq_rt)) ;
lb = exp(shift(log(close),lngth).-(sma5.*sq_rt)) ;

all_ub = all_ub .+ ub ;
all_lb = all_lb .+ lb ;

lngth = 8 ;
sq_rt = sqrt( lngth ) ;
sma8 = sma(abslogdiff,lngth) ;
ub = exp(shift(log(close),lngth).+(sma8.*sq_rt)) ;
lb = exp(shift(log(close),lngth).-(sma8.*sq_rt)) ;

all_ub = all_ub .+ ub ;
all_lb = all_lb .+ lb ;

lngth = 13 ;
sq_rt = sqrt( lngth ) ;
sma13 = sma(abslogdiff,lngth) ;
ub = exp(shift(log(close),lngth).+(sma13.*sq_rt)) ;
lb = exp(shift(log(close),lngth).-(sma13.*sq_rt)) ;

all_ub = all_ub .+ ub ;
all_lb = all_lb .+ lb ;

lngth = 21 ;
sq_rt = sqrt( lngth ) ;
sma21 = sma(abslogdiff,lngth) ;
ub = exp(shift(log(close),lngth).+(sma21.*sq_rt)) ;
lb = exp(shift(log(close),lngth).-(sma21.*sq_rt)) ;

all_ub = ( all_ub .+ ub ) ./ 5 ;
all_lb = ( all_lb .+ lb ) ./ 5 ;

 plot(close(1850:2100,1),'y',ub(1850:2100,1),'c',lb(1850:2100,1),'r')
%plot(close,'y',ub,'c',lb,'r')

Saturday 26 October 2013

Change of Direction in Neural Net Training.

Up till now when training my neural net classifier I have been using what I call my "idealised" data, which is essentially a model for prices comprised of a sine wave along with a linear trend. In my recent work on Savitzky-Golay filters I wanted to increase the amount of available training data but have run into a problem - the size of my data files has become so large that I'm exhausting the memory available to me in my Octave software install. To overcome this problem I have decided to convert my NN training to an online regime rather than loading csv data files for mini-batch gradient descent training - in fact what I envisage doing would perhaps be best described as online mini-batch training.

To do this will require a fairly major rewrite of my data generation code and I think I will take the opportunity to make this data generation more realistic than the sine wave and trend model I am currently using. Apart from the above mentioned memory problem, this decision has been prompted by an e-mail exchange I have recently had with a reader of this blog and some online reading I have recently been doing, such as mixed regime simulation, training a simple neural network to recognise different regimes in financial time series, neural networks in trading, random subspace optimisation and profitable hidden and markovian coupling. At the moment it is my intention to revisit my earlier posts here and here and use/extend the ideas contained therein. This means that for the nearest future my work on Savitzky-Golay filters is temporarily halted.

Tuesday 15 October 2013

Code refactoring didn't work.

In my previous post I said that I was going to refactor code to use my existing, trained classification neural net with Savitzky-Golay filter inputs in the hope that this might show some improvement. I have to report that this was an abysmal failure and I suppose it was a bit naive to have ever expected that it would work. It looks like I will have to explicitly train a new neural network to take advantage of Savitzky-Golay filter features.

Sunday 6 October 2013

Update on Savitzky-Golay Filters

For the past couple of weeks I have been playing around with Savitzky-Golay filters with a hope of creating a moving endpoint regression line as a form of zero lag smoothing, but unfortunately I have been unable to come up with anything that is remotely satisfying and I think I'm going to abandon this for now. My view at the moment is that SG filters' utility, for my purposes at least, is limited to feature extraction via the polynomial coefficients to get the 2nd, 3rd and 4th derivatives as inputs for my Neural net classifier.

To do this will require a time consuming retraining period, so before I embark on this I'm going to use what I think will be an immediately useful SG filter application. In a previous post readers can see three screenshots of moving windows of my idealised price series with SG filters overlaid. The SG filters follow these windowed "prices" so closely that I think SG filters more or less == windowed idealised prices and features derived there from. However, when it comes to applying my currently trained NN the features are derived from the noisy real prices seen in the video of the above linked post. What I intend to do wherever possible is to use a cubic SG filter applied to windowed prices and then derive input features from the SG filter values. Effectively what I will be doing is coercing real prices into smooth curves that far more closely resemble the idealised prices my NN was trained on. The advantage of this is that I will not have to retrain the NN, but only refactor my feature extraction code for use on real prices to this end. The hope is, of course, that this tweak will result in vastly improved performance to the extent that I will not need to consider future classifier NN retraining with polynomial derivatives and can proceed directly to training other NNs.

Finally, on a related note, I have recently been directed to this paper, which has some approaches to time series classification that I might incorporate in future NN training. Fig. 5 on page 9 shows six examples of time series classification categories. My idealised prices categories currently encompass what is described in the paper as "Cyclic," "Increasing Trend" and "Decreasing Trend." I shall have to consider whether it might be useful to extend my categories to include "Normal," "Upward Shift" and "Downward Shift." 

Wednesday 2 October 2013

Review of Quantpedia.com

I was recently contacted and asked if I wouldn't mind writing a short review of the Quantpedia website in return for complimentary access to that site's premium section, and I am happy to do so. Readers can get a flavour of the website by perusing the free content that is available in the Screener tab and reading the Home, About and How We Do It tabs. Since these are readily accessible by any visitor to the site, this review will concentrate purely on the content available in the premium section.

The thing that strikes me is the wide and eclectic range of studies available, which can be easily seen by clicking on the keywords dropdown box on the screener tab. There is something for almost all trading styles and I would be surprised if a visitor to the premium section found nothing of value. As always the devil is in the details, and of course I haven't read anywhere near all the information that is available in the various studies, but a brief visual overview of the various studies' performance taken from Quantpedia's screener page is provided in the scatterchart below:
(n.b. Those studies that lie exactly on the y-axis (volatility = 0%) do not have no volatility, but no % figure for volatility was given in the screener.)

As can be seen there are some impressive performers, which are fully disclosed in the premium section. A few studies, irrespective of performance, did catch my immediate attention. Firstly, there are a couple of studies that look at lunar effects in stock markets and precious metals. Long time readers of this blog will know that some time back I did a series of tests on the Delta Phenomenon, which is basically a lunar effect model of price turning points. The conclusion of the tests I conducted was that the Delta Phenomenon did have statistically significant predictive ability. The above mentioned studies come to the same conclusion with regard to lunar effects, although via a different testing methodology. It is comforting to have one's own research conclusions confirmed by independent researchers, and it's a valuable resource to be able to see how other researchers approach the testing of similar market hypotheses.

Secondly, there is a study on using Principal Components Analysis to characterise the current state of the market. This is very much in tune with what I'm working on at the moment with my neural net market classifier, and the idea of using PCA as an input is a new one to me and one that I shall almost certainly look into in more detail.

This second point I think neatly sums up the main value of the studies on the Quantpedia site - they can give you new insights as to how one might develop one's own trading system(s), with the added benefit that the idea is not a dead end because it has already been tested by the original paper's authors and the Quantpedia site people. You could also theoretically just take one of the studies as a stand alone system and tweak it to suit your own needs, or add it as a form of diversification to an existing set of trading systems. Given the wide range of studies available, this would be a much more robust form of diversification than merely adjusting a look back length, parameter value or some other such cosmetic adjustment.

In conclusion, since I can appreciate the value in the Quantpedia site, I would like to thank Martin Nizny of Quantpedia for extending me the opportunity to review the premium section of the site.