The first round of tests, consisting of tests of the Medium Term Solution for the S & P 500, is now complete and the summary results are given below.
Years 2007 to 2008 - 16th Aug 2007 to 6th Jan 2009 incl.
[1] 9.722222 - test statistic, the average error of identified turns, in days
[1] 103 - number of random permutations as good as or better than test stat
[1] 0.0206 - above expressed as % of permutations, the p-value
[1] 37.6425 - average of permutation error distribution, in days
[1] 16.32295 - standard deviation of permutation error distribution, in days
[1] 1.710493 - test statistic distance from average of permutation error distribution, expressed as a multiple of standard deviation of permutation error distribution
Year 2009 - 20th Dec 2008 to 16th Jan 2010 incl.
[1] 10.42857 - test statistic, the average error of identified turns, in days
[1] 809 - number of random permutations as good as or better than test stat
[1] 0.1618 - above expressed as % of permutations, the p-value
[1] 33.43553 - average of permutation error distribution, in days
[1] 14.18181 - standard deviation of permutation error distribution, in days
[1] 1.622286 - test statistic distance from average of permutation error distribution, expressed as a multiple of standard deviation of permutation error distribution
Year 2010 - 27th Nov 2009 to 28th Jan 2011 incl.
[1] 11.35714 - test statistic, the average error of identified turns, in days
[1] 1706 - number of random permutations as good as or better than test stat
[1] 0.3412 - above expressed as % of permutations, the p-value
[1] 34.92687 - average of permutation error distribution, in days
[1] 15.36711 - standard deviation of permutation error distribution, in days
[1] 1.533777 - test statistic distance from average of permutation error distribution, expressed as a multiple of standard deviation of permutation error distribution
All out of sample data - 16th Aug 2007 to 28th Jan 2012
[1] 9.8 - test statistic, the average error of identified turns, in days
[1] 0 - number of random permutations as good as or better than test stat
[1] 0 - above expressed as % of permutations, the p-value
[1] 67.47298 - average of permutation error distribution, in days
[1] 29.38267 - standard deviation of permutation error distribution, in days
[1] 1.962823 - test statistic distance from average of permutation error distribution, expressed as a multiple of standard deviation of permutation error distribution
The histogram plot of the final, all out of sample data
I think the results are unambiguous - given the consistently low p-values the null hypothesis can be rejected and the alternative hypothesis accepted i.e. the S & P 500 Medium term solution is NOT random and therefore has some predictive value.
My next round of tests will be of the Intermediate term solution for the S & P 500. Before conducting these tests however, I would like to state my expectation that the results will not be as conclusive as those given above. This is due to the fact that the Intermediate term solution also has 12 points, but these 12 points occur within a time period that is approximately 120 days long, so the actual average error on this time frame will have to be 2 or 3 bars/days or better to match the above results.
"Trading is statistics and time series analysis." This blog details my progress in developing a systematic trading system for use on the futures and forex markets, with discussion of the various indicators and other inputs used in the creation of the system. Also discussed are some of the issues/problems encountered during this development process. Within the blog posts there are links to other web pages that are/have been useful to me.
Monday, 13 February 2012
Sunday, 5 February 2012
Testing the Delta Phenomenon, Part 4
The results of the first test are in! The test period approximately covers 2011, from 6th January 2011 to 12th January 2012 inclusive and the test was of the MTD Delta solution for the S&P 500. I chose the S&P 500 because I suppose that most readers would be interested in this market as a proxy for stock markets in general. There are 12 turning points in this solution and a screen shot is shown below.
The projected solution turning points are indicated with white, red and green lines whilst the actual turns in the data are indicated in yellow. For the purposes of counting, each yellow, actual turn line is paired with one of the projected solution lines, i.e. counting from the left the first yellow line is paired with the first white solution line, the second with the second, third with third etc. The red solution lines indicate the "inversion" window with the green line between them being the number 1 point of the solution.
The first thing that stands out is that the biggest move of the year occurred within the inversion time period window, and the Delta Phenomenon book states that big moves can be expected around the number 1 point (green line), so kudos to Delta so far. Another thing that stands out is that most turning points occur within one to two weeks of their projected turning points, well within the margin of error for this solution. Kudos plus 2 for Delta. However the litmus test is this
[1] 8.769231
[1] 278
[1] 0.0556
[1] 33.09061
[1] 14.54137
[1] 1.672565
the results of a 0.5 million random permutation test. With only 278 random permutations out of half a million matching or bettering the average error test statistic, giving a p-value of 0.0556%, the null hypothesis can be rejected at a statistical significance level of 0.1%. Delta has passed this first test.
The projected solution turning points are indicated with white, red and green lines whilst the actual turns in the data are indicated in yellow. For the purposes of counting, each yellow, actual turn line is paired with one of the projected solution lines, i.e. counting from the left the first yellow line is paired with the first white solution line, the second with the second, third with third etc. The red solution lines indicate the "inversion" window with the green line between them being the number 1 point of the solution.
The first thing that stands out is that the biggest move of the year occurred within the inversion time period window, and the Delta Phenomenon book states that big moves can be expected around the number 1 point (green line), so kudos to Delta so far. Another thing that stands out is that most turning points occur within one to two weeks of their projected turning points, well within the margin of error for this solution. Kudos plus 2 for Delta. However the litmus test is this
[1] 8.769231
[1] 278
[1] 0.0556
[1] 33.09061
[1] 14.54137
[1] 1.672565
the results of a 0.5 million random permutation test. With only 278 random permutations out of half a million matching or bettering the average error test statistic, giving a p-value of 0.0556%, the null hypothesis can be rejected at a statistical significance level of 0.1%. Delta has passed this first test.
Monday, 30 January 2012
Testing the Delta Phenomenon, Part 3
Readers may recall from my recent post that the "problem" with the Delta Phenomenon is that it is very subjective and therefore difficult to test objectively. This post will outline how I intend to conduct such an objective test, using R, and discuss some issues related to the test.
Firstly, I am going to use the statistical concept of Cross-validation, whereby a model is trained on one set of data and tested for accuracy on another set of data, which is actually quite common in the world of back testing. Since the "Delta solutions" I will be testing come from late 2006 (See Testing the Delta Phenomenon, Part 2) the 4 years from 2008 to 2011 inclusive can be considered to be the validation set for this test. The test(s) will assess the accuracy of the predicted highs and lows for this period, on each Delta time frame for which I have solutions, by counting the difference in days between actual highs and lows in the data and their predicted occurrences and then creating an average error for these differences. This will be the test statistic. Using R, a Null Hypothesis average error distribution for random predictions on the same data will be created, using the same number of predicted turning points as per the Delta solution being tested. The actual average error test statistic on real data will be compared with this Null Hypothesis distribution of the average error test statistic and the Null Hypothesis rejected or not, as the case may be. The Null Hypothesis may be stated as
All of this might be made clearer for readers by following the commented R code below.
which shows a histogram of the distribution of random prediction average errors in yellow, with the actual average error shown in red. This is for the illustrative hypothetical values used in the code box above. Terminal prompt output for this is
[1] 6.5
[1] 2.088655e-08
[1] 38
[1] 0.00038
[1] 63.78108
[1] 32.33727
[1] 1.771364
where
6.5 is actual average error in days
2.088655e-08 is "theoretical" probability of Delta being this accurate
38 is number of times a random prediction is as good as or better than 6.5
0.00038 is 38 expressed as a percentage of random predictions made
63.78108 is the mean of the random distribution histogram
32.33727 is the standard deviation of the random distribution histogram
1.771364 is the difference between 63.78108 and 6.5 expressed as a multiple of the 32.33727 standard deviation.
This would be an example of the Null Hypothesis being rejected due to the 0.00038 % figure for random prediction accuracy being better than actual accuracy; in statistical parlance - a low p-value. Note, however, gross the difference between this figure and the "theoretical" figure. Also note that despite the Null being rejected the actual average error falls well within 2 standard deviations from the mean of the random distribution. This of course is due to the extremely heavy right-tailedness of the distribution, which expands the standard deviation range.
This second plot
and
[1] 77.75
[1] 2.088655e-08
[1] 48207
[1] 0.48207
[1] 79.85934
[1] 27.60137
[1] 0.07642148
shows what a typical failure to reject the Null Hypothesis would look like - a 0.48 p-value - and an actual average error that is indistinguishable from random, typified by it being well within a nice looking bell curve distribution.
So there it is, the procedure I intend to follow to objectively test the accuracy of the Delta Phenomenon.
Firstly, I am going to use the statistical concept of Cross-validation, whereby a model is trained on one set of data and tested for accuracy on another set of data, which is actually quite common in the world of back testing. Since the "Delta solutions" I will be testing come from late 2006 (See Testing the Delta Phenomenon, Part 2) the 4 years from 2008 to 2011 inclusive can be considered to be the validation set for this test. The test(s) will assess the accuracy of the predicted highs and lows for this period, on each Delta time frame for which I have solutions, by counting the difference in days between actual highs and lows in the data and their predicted occurrences and then creating an average error for these differences. This will be the test statistic. Using R, a Null Hypothesis average error distribution for random predictions on the same data will be created, using the same number of predicted turning points as per the Delta solution being tested. The actual average error test statistic on real data will be compared with this Null Hypothesis distribution of the average error test statistic and the Null Hypothesis rejected or not, as the case may be. The Null Hypothesis may be stated as
- given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is no better than random guessing as to where the turning points will occur.
- given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is better than could be expected from random guessing as to where the turning points will occur.
All of this might be made clearer for readers by following the commented R code below.
# Assume a 365 trading day year, with 4 Delta turning points in this year
# First, create a Delta Turning points solution vector, the projected
# days on which the market will make a high or a low
proj_turns <- c(47,102,187,234) # day number of projected turning points
# now assume we apply the above Delta solution to future market data
# and identify, according to the principles of Delta, the actual turning
# points in the future unseen "real data"
real_turns <- c(42,109,193,226) # actual market turns occur on these days
# calculate the distance between the real_turns and the days on which
# the turn was predicted to occur and calculate the test statistic
# of interest, the avge_error_dist
avge_error_dist <- mean( abs(proj_turns - real_turns) )
print(avge_error_dist) # print for viewing
# calculate the theoretical probability of randomly picking 4 turning points
# in our 365 trading day year and getting an avge_error_dist that is equal
# to or better than the actual avge_error_dist calculated above.
# Taking the first projected turning point at 47 and the actual turning
# that occurs at 42, to get an error for this point that is as small as or
# smaller than that which actually occurs, we must randomly choose one of
# the following days: 42,43,44,45,46,47,48,49,50,51 or 52. The probability of
# randomly picking one of these numbers out of 1 to 365 inclusive is
a <- 11/365
# and similarly for the other 3 turning points
b <- 15/364 # turning point 2
c <- 13/363 # turning point 3
d <- 17/362 # turning point 4
# Note that the denominator decreases by 1 each time because we are
# sampling without replacement i.e. it is not possible to pick the same
# day more than once. Combining the 4 probabilities above, we get
rdn_prob_as_good <- (a*b*c*d)/100 # expressed as a %
print( rdn_prob_as_good ) # a very small % !!!
# but rather than rely on theoretical calculations, we are actually
# going to repeated, randomly choose 4 turning points and compare their
# accuracy with the "real accuracy", as measured by avge_error_dist
# Create our year vector to sample, consisting of 365 numbered days
year_vec <- 1:365
# predefine vector to hold results
result_vec <- numeric(100000) # because we are going to resample 100000 times
# count how many times a random selection of 4 turning points is as good
# as or better than our "real" results
as_good_as = 0
# do the random turning point guessing, resampling year_vec, in a loop
for(i in 1:100000) {
# randomly choose 4 days from year_vec as turning points
this_sample <- sample( year_vec , size=4 , replace=FALSE )
# sort this_sample so that it is in increasing order
sorted_sample <- sort( this_sample , decreasing=FALSE )
# calculate this_sample_avge_error_dist, our test statistic
this_sample_avge_error_dist <- mean( abs(proj_turns - sorted_sample) )
# if the test statistic is as good as or better that our real result
if( this_sample_avge_error_dist <= avge_error_dist ) {
as_good_as = as_good_as + 1 # increment as_good_as count
}
# assign this sample result to result_vec
result_vec[i] <- this_sample_avge_error_dist
}
# convert as_good_as to %
as_good_as_percent <- as_good_as/100000
# some summary statistics of result_vec
mean_of_result_vec <- mean( result_vec )
standard_dev_of_result_vec <- sd( result_vec )
real_result_from_mean <- ( mean_of_result_vec - avge_error_dist )/standard_dev_of_result_vec
print( as_good_as ) # print for viewing
print( as_good_as_percent ) # print for viewing
print( mean_of_result_vec ) # print for viewing
print( standard_dev_of_result_vec ) # print for viewing
print( real_result_from_mean ) # print for viewing
# plot histgram of the result_vec
hist( result_vec , freq=FALSE, col='yellow' )
abline( v=avge_error_dist , col='red' , lwd=3 )
Typical output of this code iswhich shows a histogram of the distribution of random prediction average errors in yellow, with the actual average error shown in red. This is for the illustrative hypothetical values used in the code box above. Terminal prompt output for this is
[1] 6.5
[1] 2.088655e-08
[1] 38
[1] 0.00038
[1] 63.78108
[1] 32.33727
[1] 1.771364
where
6.5 is actual average error in days
2.088655e-08 is "theoretical" probability of Delta being this accurate
38 is number of times a random prediction is as good as or better than 6.5
0.00038 is 38 expressed as a percentage of random predictions made
63.78108 is the mean of the random distribution histogram
32.33727 is the standard deviation of the random distribution histogram
1.771364 is the difference between 63.78108 and 6.5 expressed as a multiple of the 32.33727 standard deviation.
This would be an example of the Null Hypothesis being rejected due to the 0.00038 % figure for random prediction accuracy being better than actual accuracy; in statistical parlance - a low p-value. Note, however, gross the difference between this figure and the "theoretical" figure. Also note that despite the Null being rejected the actual average error falls well within 2 standard deviations from the mean of the random distribution. This of course is due to the extremely heavy right-tailedness of the distribution, which expands the standard deviation range.
This second plot
and
[1] 77.75
[1] 2.088655e-08
[1] 48207
[1] 0.48207
[1] 79.85934
[1] 27.60137
[1] 0.07642148
shows what a typical failure to reject the Null Hypothesis would look like - a 0.48 p-value - and an actual average error that is indistinguishable from random, typified by it being well within a nice looking bell curve distribution.
So there it is, the procedure I intend to follow to objectively test the accuracy of the Delta Phenomenon.
Friday, 27 January 2012
Testing the Delta Phenomenon, Part 2
Towards the end of 2006 the Delta Society message board had a lot of screen shots, short screen casts and uploaded file attachments and quite a lot of discussion about them was generated. As a Delta Phenomenon book owner I had access to this, and I have to admit I was really surprised at how much information was, perhaps inadvertently, being given out. What was typically shown is exemplified by posts #81 and #86 on this forum thread. Unfortunately all that now seems to have been taken down; when I log in to the message board now all I see is two old posts from February 2007 and an "Information for all Clients" section. I remember that back in 2006 I spent hours and hours studying all this information and cross referencing it with the book. The result of all my work was that I now have what are called the Intermediate Term and Medium Term "Delta Solutions" for 30 major US commodity contracts and 6 forex pairs, including the 4 major pairs. However, after getting these solutions, for reasons outlined in my previous post, I did nothing with them and for the last 5 years my hand written notes, along with my copy of the book, have been languishing on my bookshelf.
The main "problem" with the Delta Phenomenon is that it can be very subjective. This assertion is borne out by this quote - "We have observed over time that market participants using Delta and applying it differently do not make buy & sell decisions en-masse at the same time. Also, while the Delta order is perfect, our interpretation of the order is never 100% accurate." which is taken from the FAQ section of the above mentioned message board. Also here are some random comments that I have cut and pasted from the above linked forum thread (views expressed are those of the original forum posters, not mine)
Resampling
Monte Carlo Methods
Bootstrapping
Statistical Hypothesis Testing
The main "problem" with the Delta Phenomenon is that it can be very subjective. This assertion is borne out by this quote - "We have observed over time that market participants using Delta and applying it differently do not make buy & sell decisions en-masse at the same time. Also, while the Delta order is perfect, our interpretation of the order is never 100% accurate." which is taken from the FAQ section of the above mentioned message board. Also here are some random comments that I have cut and pasted from the above linked forum thread (views expressed are those of the original forum posters, not mine)
- ...as the delta phenomenom could be really nothing but a recurring coincidence that no one can actually use with a good accuracy rate.
- ...while there may be something in the theory, finding a practical application for it is impossible. To make sense of it everything has to be viewed in hindsight...
- I thought it was interesting, but just like drawing lines on a chart, I could find a turning point any sequence I decided to use. I think we humans are extraordinary at seeking out patterns in the world around us. Maybe not such a good thing though if there really is no pattern, just perception.
- Like most any indicator, it looks great in hindsight as you can apply each turning point to the nearest high/low. Problem is, of course, you never know this in real time, only after the fact.
- Trading with Delta is a lot like looking at an MT4 indicator that "repaints".
- Mind you, I'm not saying the concept behind Delta is not valid. Just that because of the latitude afforded on either side of the "turning point" day, plus the idea of inversions......it's just real tough to be sure until after the fact. Even the monthly newsletter that Wilder sent out with turning point calls was often subject to correction and change.....a lot of subjectivity with this method.
- Much is left to the trader's own subjective judgment.
Resampling
Monte Carlo Methods
Bootstrapping
Statistical Hypothesis Testing
Testing the Delta Phenomenon
The second book I ever bought about trading was Welles Wilder's "The Delta Phenomenon," which I bought as a result of a recommendation in the first book I ever bought. Subsequently I also bought the "The Adam Theory of Markets" and a few years later I bought the "Ocean Theory" book, so one could say I own the whole trilogy!
When I bought these I was doing my charting by hand on graph paper using prices from the Wall Street Journal, but in due course I got a computer and began using various software programs; Excel, Open Office Calc, QtStalker and finally ended up where I am today using Octave, R and Gnuplot. But however proficient I became at using these last three my programming skills weren't up to coding the Delta Phenomenon, until now that is. I had already quite easily coded the Adam Projection and the Natural Market Mirror, Natural Market River and Natural Moving Average from Ocean theory. Over the next few posts I am going to outline how I intend to test the Delta Phenomenon and show the eventual results of these tests, but before that I am going to present in this post the "breakthrough" piece of coding that finally allows me to do so. I think other users of R may find it useful.
The issue to be solved was overcoming the problem of missing days in the time series data, e.g. weekends, holidays, exchange closures, just plain missing data etc. because the Delta Phenomenon is predicated on counting days before or after specified dates, and of course any algorithmic counting over the data would be upset by such missing days. My forum query here and forum searches that turned up this led me to the R xts package, which finally resulted in me being able to write the following piece of R code
1995-01-03,828.55,829.45,827.55,828.8,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-04,830.2,831.25,827.6,831.1,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-05,830.5,831.45,829.85,830.6,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-06,830.75,833.05,829,830.35,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-07,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-08,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
This might not actually look like much, but using this as input to this Gnuplot script
Many readers might say "So what! And...?" but to those readers who know what the Delta Phenomenon is, the coloured lines will be significant. Furthermore, using free, open source software I have created a free, as in gratis, alternative to the software that the Delta Society sells on its website. Most importantly of all, of course, is that I am now in a position to do computerised testing of the Delta Phenomenon. More on these tests in upcoming posts.
When I bought these I was doing my charting by hand on graph paper using prices from the Wall Street Journal, but in due course I got a computer and began using various software programs; Excel, Open Office Calc, QtStalker and finally ended up where I am today using Octave, R and Gnuplot. But however proficient I became at using these last three my programming skills weren't up to coding the Delta Phenomenon, until now that is. I had already quite easily coded the Adam Projection and the Natural Market Mirror, Natural Market River and Natural Moving Average from Ocean theory. Over the next few posts I am going to outline how I intend to test the Delta Phenomenon and show the eventual results of these tests, but before that I am going to present in this post the "breakthrough" piece of coding that finally allows me to do so. I think other users of R may find it useful.
The issue to be solved was overcoming the problem of missing days in the time series data, e.g. weekends, holidays, exchange closures, just plain missing data etc. because the Delta Phenomenon is predicated on counting days before or after specified dates, and of course any algorithmic counting over the data would be upset by such missing days. My forum query here and forum searches that turned up this led me to the R xts package, which finally resulted in me being able to write the following piece of R code
rm(list=ls(all=TRUE)) # remove all previous data from workspace
library(xts) # load the required library
# preparation for output to new file
sink(file="solution",append=FALSE,type=c("output"),split=FALSE)
# read in the raw data files
data_1 <- read.csv(file="sp",header=FALSE,sep=",")
itdmtd <- read.csv(file="itdmtd",header=FALSE,sep=",")
# create xts objects from above data:
x <- xts(data_1[c('V2','V3','V4','V5')],as.Date(data_1[,'V1']))
y <- xts(itdmtd[c('V2','V3','V4','V5','V6','V7','V8','V9')],as.Date(itdmtd[,'V1']))
z <- merge.xts(x,y) # merge x and y
# create a contiguous date vector to encompass date range of above data
d <- timeBasedSeq(paste(start(z),end(z),"d",sep="/"), retclass="Date")
# merge z with an "empty" xts object, xts(,d), filling with NA
prices <- merge(z,xts(,d),fill=NA)
# coerce prices xts object to a data frame object
prices_df <- data.frame(date=index(prices), coredata(prices))
# output to new file
write.table(prices_df,quote=FALSE,row.names=FALSE,col.names=FALSE,sep=",")
sink()
The code takes a csv file of the price time series, merges it with a csv file of the "Delta Solution," fills in any missing dates and then writes out the result to a final csv file that looks like this1995-01-03,828.55,829.45,827.55,828.8,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-04,830.2,831.25,827.6,831.1,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-05,830.5,831.45,829.85,830.6,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-06,830.75,833.05,829,830.35,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-07,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-08,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
This might not actually look like much, but using this as input to this Gnuplot script
reset
set title "Medium Term Delta Turning Points" textcolor rgb "#FFFFFF"
set object 1 rect from graph 0, 0, 0 to graph 1.1, 1.1, 0 behind lw 1.0 fc rgb "#000000" fillstyle solid 1.00
set datafile separator ","
set xdata time
set timefmt "%Y-%m-%d"
set format x
set y2range [0:1]
plot "solution" using 1:10 notitle with impulses linecolor rgb "#B0171F" axis x1y2, \
"solution" using 1:11 notitle with impulses linecolor rgb "#0000FF" axis x1y2, \
"solution" using 1:12 notitle with impulses linecolor rgb "#FFA500" axis x1y2, \
"solution" using 1:13 notitle with impulses linecolor rgb "#00EE00" axis x1y2, \
"solution" using 1:2:3:4:5 notitle with candlesticks linecolor rgb "#FFFFFF" axis x1y1, \
"solution" using 1:($$2>$$5?$$2:1/0):($$2>$$5?$$3:1/0):($$2>$$5?$$4:1/0):($$2>$$5?$$5:1/0) notitle with candlesticks lt 1 axis x1y1, \
"solution" using 1:($$2<$$5?$$2:1/0):($$2<$$5?$$3:1/0):($$2<$$5?$$4:1/0):($$2<$$5?$$5:1/0) notitle with candlesticks lt 3 axis x1y1
gives a nice plot thusMany readers might say "So what! And...?" but to those readers who know what the Delta Phenomenon is, the coloured lines will be significant. Furthermore, using free, open source software I have created a free, as in gratis, alternative to the software that the Delta Society sells on its website. Most importantly of all, of course, is that I am now in a position to do computerised testing of the Delta Phenomenon. More on these tests in upcoming posts.
Tuesday, 10 January 2012
Formula for Approximating VWAP from OHLC Data
An answer to another question on the Quantitative Finance forum here gives a formula that approximates the volume weighted average price (VWAP) from OHLC data only, with an exhortation in a comment to do one's own research before using it. This blog post is my quick attempt at such research.
My simple test uses the relationship between the VWAP and the Pivot point calculated as (H+L+C)/3. This wiki entry on pivot points states "Trading above or below the pivot point indicates the overall market sentiment" and my test of being above or below is the VWAP being above or below: i.e. if today's VWAP is above today's pivot point (calculated from yesterday's H,L & C) then this is s bullish sign and therefore we can expect tomorrow's VWAP to be higher than today's. The converse is true for bearish sentiment. In Octave it is trivially simple to write a script to test whether this hypothesis is true and the terminal output of the results is shown in the box below.
The first line identifies the market being tested (a forex pair or commodity), the second "sign_correct" is the percentage of time the hypothesis appears to be true: i.e. for the first AUSCAD entry the figure 67.228 means that 67.228 % of the time the following day's VWAP is actually higher/lower than today's according to the "prediction" based on the categorisation of today's bar being bullish or bearish. The third "result" is the p-value of a 5000 repetition Monte Carlo permutation test to test the statistical significance of the given % accuracy. The permutation test was the "sign" test described in Evidence Based Technical Analysis and in the PDF available from its companion website.
Given that all the p-values are zero it can be stated that the VWAP being above or below the pivot point has a non random predictive accuracy for the next day's VWAP being higher or lower to a degree of accuracy that is statistically significant.
My simple test uses the relationship between the VWAP and the Pivot point calculated as (H+L+C)/3. This wiki entry on pivot points states "Trading above or below the pivot point indicates the overall market sentiment" and my test of being above or below is the VWAP being above or below: i.e. if today's VWAP is above today's pivot point (calculated from yesterday's H,L & C) then this is s bullish sign and therefore we can expect tomorrow's VWAP to be higher than today's. The converse is true for bearish sentiment. In Octave it is trivially simple to write a script to test whether this hypothesis is true and the terminal output of the results is shown in the box below.
The first line identifies the market being tested (a forex pair or commodity), the second "sign_correct" is the percentage of time the hypothesis appears to be true: i.e. for the first AUSCAD entry the figure 67.228 means that 67.228 % of the time the following day's VWAP is actually higher/lower than today's according to the "prediction" based on the categorisation of today's bar being bullish or bearish. The third "result" is the p-value of a 5000 repetition Monte Carlo permutation test to test the statistical significance of the given % accuracy. The permutation test was the "sign" test described in Evidence Based Technical Analysis and in the PDF available from its companion website.
Given that all the p-values are zero it can be stated that the VWAP being above or below the pivot point has a non random predictive accuracy for the next day's VWAP being higher or lower to a degree of accuracy that is statistically significant.
auscad
sign_correct = 65.695
result = 0
------------------
aususd
sign_correct = 67.228
result = 0
------------------
ausyen
sign_correct = 67.100
result = 0
------------------
bo
sign_correct = 63.989
result = 0
------------------
c
sign_correct = 63.530
result = 0
------------------
cc
sign_correct = 60.565
result = 0
------------------
cl
sign_correct = 63.318
result = 0
------------------
ct
sign_correct = 60.273
results = 0
------------------
dx
sign_correct = 63.811
result = 0
------------------
ed
sign_correct = 63.945
results = 0
------------------
euraus
sign_correct = 68.024
results = 0
------------------
eurcad
sign_correct = 66.854
result = 0
------------------
eurchf
sign_correct = 65.707
result = 0
------------------
eurgbp
sign_correct = 66.760
result = 0
------------------
eurusd
sign_correct = 65.544
result = 0
------------------
euryen
sign_correct = 66.444
result = 0
------------------
fc
sign_correct = 61.905
result = 0
------------------
gbpchf
sign_correct = 67.497
result = 0
------------------
gbpusd
sign_correct = 66.936
result = 0
------------------
gbpyen
sign_correct = 66.936
result = 0
------------------
gc
sign_correct = 60.667
result = 0
------------------
hg
sign_correct = 58.554
result = 0
------------------
ho
sign_correct = 62.685
result = 0
------------------
kc
sign_correct = 61.732
result = 0
------------------
lb
sign_correct = 61.765
result = 0
------------------
lc
sign_correct = 62.372
result = 0
------------------
lh
sign_correct = 61.601
result = 0
------------------
ng
sign_correct = 62.356
result = 0
------------------
o
sign_correct = 60.705
result = 0
------------------
oj
sign_correct = 61.848
result = 0
------------------
pa
sign_correct = 62.497
result = 0
------------------
pb
sign_correct = 59.116
result = 0
------------------
pl
sign_correct = 60.737
result = 0
------------------
rb
sign_correct = 63.107
result = 0
------------------
s
sign_correct = 64.091
result = 0
------------------
sb
sign_correct = 61.106
result = 0
------------------
si
sign_correct = 59.563
result = 0
------------------
sm
sign_correct = 63.810
result = 0
------------------
sp
sign_correct = 66.954
result = 0
------------------
es
sign_correct = 66.744
result = 0
------------------
nd
sign_correct = 66.221
result = 0
------------------
ty
sign_correct = 65.260
result = 0
------------------
us
sign_correct = 65.893
result = 0
------------------
usdcad
sign_correct = 67.357
result = 0
------------------
usdchf
sign_correct = 67.088
result = 0
------------------
usdyen
sign_correct = 66.947
result = 0
------------------
w
sign_correct = 65.118
result = 0
At the moment I use the value given by the pivot point calculation as my "price input" for the day but, based on these test results and the future results of some other tests I can think of, I may change to the VWAP approximation as my price input. More in a future post.
Tuesday, 3 January 2012
Creating Synthetic Data Using the Fast Fourier Transform
This question was recently asked on the Quantitative Finance forum and one answerer's link to here piqued my interest. My comments to said answer were
"Interesting link! I theorise that one way it could be done is to apply a Fast Fourier Transform (e.g. FFT function in Octave/MATLAB) to the original series, then divide the resulting vector into segments and within each separate segment randomly permute the values, and finally reconstruct the time series by applying the Inverse Fast Fourier Transform (IFFT function). By randomly permuting the values you will not alter the amplitudes of the composite sine waves but will slightly alter their periodicity - this will naturally tend to restrict the range of the new series to that of the original."
and
"This is such an intriguing approach that I may "knock up" some Octave code on my blog and post a link to it from here."
This blog post is my attempt to "knock up" the Octave code.
The "guts" of the code are a .oct function that randomly permutes the FFT output vector that is given to it as its only input argument and the C++ code for this .oct function is given below.
Note: due to formatting issues the headers are not correctly displayed; the octave headers should be enclosed between a "<" and ">" after the #include.
All S&P data
The last few years of the above
The same last few years indexed using the index_begin and index_end method to apply the function just to these years and not the whole data set
The close_index_begin, detrended_close_index_begin and detrended_close_index_end values also seen in the terminal windows are a simple sanity check to see that the original data has been detrended correctly prior to performing the FFT.
"Interesting link! I theorise that one way it could be done is to apply a Fast Fourier Transform (e.g. FFT function in Octave/MATLAB) to the original series, then divide the resulting vector into segments and within each separate segment randomly permute the values, and finally reconstruct the time series by applying the Inverse Fast Fourier Transform (IFFT function). By randomly permuting the values you will not alter the amplitudes of the composite sine waves but will slightly alter their periodicity - this will naturally tend to restrict the range of the new series to that of the original."
and
"This is such an intriguing approach that I may "knock up" some Octave code on my blog and post a link to it from here."
This blog post is my attempt to "knock up" the Octave code.
The "guts" of the code are a .oct function that randomly permutes the FFT output vector that is given to it as its only input argument and the C++ code for this .oct function is given below.
Note: due to formatting issues the headers are not correctly displayed; the octave headers should be enclosed between a "<" and ">" after the #include.
#include octave/oct.h
#include octave/dColVector.h
#include octave/CNDArray.h
#include "MersenneTwister.h"
DEFUN_DLD (permute_vector, args, , "Input is a vector that is to be permuted once")
{
octave_value_list retval;
int nargin = args.length () ;
int vec_length = args(0).length () ;
// check the input arguments
if ( nargin > 1 )
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}
if (vec_length < 2 )
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}
if (error_state)
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}
// end of input checking
ComplexNDArray input_vector = args(0).complex_array_value () ;
ComplexNDArray output_vector = args(0).complex_array_value () ;
int k1;
int k2;
MTRand mtrand1; // Declare the Mersenne Twister Class - will seed from system time
k1 = vec_length - 1; // initialise prior to shuffling the vector
while (k1 > 0) // While at least 2 left to shuffle
{
k2 = mtrand1.randInt( k1 ); // Pick an int from 0 through k1
if (k2 > k1) // check if random vector index no. k2 is > than max vector index - should never happen
{
k2 = k1 - 1; // But this is cheap insurance against disaster if it does happen
}
output_vector(k1) = input_vector(k2) ; // allocate random pick k2 from input_vector to the k1 vector index of output_vector
input_vector(k2) = input_vector(k1) ; // replace random pick k2 content of input_vector with content k1 of input_vector
k1 = k1 - 1; // count down
} // Shuffling is complete when this while loop exits
retval(0) = output_vector ;
return retval; // Return the output to Octave
}
The Octave script that calls the above function isclear all
contract = input( "Enter contract symbol e.g. sp: ","s") ;
data = load("-ascii",contract) ;
n = rows(data)
index_begin = input( "Enter index_begin: ") ;
index_end = input( "Enter index_end, value not greater than n: ") ;
close = data(index_begin:index_end,7) ;
% detrend the close vector prior to applying the fft
slope = ( close(end) - close(1) ) / ( length(close) - 1 ) ;
v1 = (0:1:length(close)-1)' ;
detrended_close = close .- ( v1 .* slope ) ;
close_index_begin = close(1)
detrended_close_index_begin = detrended_close(1)
detrended_close_index_end = detrended_close(end)
% create zero padded vector for fft
L2 = 2^nextpow2( length(close) ) ; half_L2 = L2/2 ;
y2 = zeros( 1,L2 ) ; y2( 1:length(close) ) = detrended_close ;
% apply the fft
transform = fft( y2 ) ;
% permute the first half of the transform vector in "chunks" of 10
max_ii_value = floor( half_L2 / 10 ) ;
for ii = 1:max_ii_value
transform( (ii*10):(ii*10)+9 ) = permute_vector( transform( (ii*10):(ii*10)+9 ) ) ;
endfor
% take the inverse fft
ifft_vec = real( ifft( transform ) ) ;
% retrend the ifft_permuted_vec
retrended_ifft_vec = ifft_vec( 1:length(close) )' .+ ( v1 .* slope ) ;
% statistics
correl = corrcoef (close, retrended_ifft_vec)
spear = spearman (close, retrended_ifft_vec)
tau = kendall (close, retrended_ifft_vec)
plot( close,'b',retrended_ifft_vec,'r' ) ; legend( 'close','permute' ) ;
This script could undoubtedly be made more efficient by putting the for loop on lines 28 to 30 within the .oct function itself, but this is just a proof of concept. The script allows one to extract any contiguous section of data by use of the index_begin and index_end inputs or, of course, to select the whole data range. Some simple statistics are calculated using in-built Octave functions and can be seen in the terminal windows in the screen shots. Finally the script plots the original input data along with the FFT permuted output data. Typical screen shots are shown below, with the original data in blue and the FFT permuted data in red.All S&P data
The last few years of the above
The same last few years indexed using the index_begin and index_end method to apply the function just to these years and not the whole data set
The close_index_begin, detrended_close_index_begin and detrended_close_index_end values also seen in the terminal windows are a simple sanity check to see that the original data has been detrended correctly prior to performing the FFT.
Subscribe to:
Comments (Atom)








