## Monday, 30 January 2012

### Testing the Delta Phenomenon, Part 3

Readers may recall from my recent post that the "problem" with the Delta Phenomenon is that it is very subjective and therefore difficult to test objectively. This post will outline how I intend to conduct such an objective test, using R, and discuss some issues related to the test.

Firstly, I am going to use the statistical concept of Cross-validation, whereby a model is trained on one set of data and tested for accuracy on another set of data, which is actually quite common in the world of back testing. Since the "Delta solutions" I will be testing come from late 2006 (See Testing the Delta Phenomenon, Part 2) the 4 years from 2008 to 2011 inclusive can be considered to be the validation set for this test. The test(s) will assess the accuracy of the predicted highs and lows for this period, on each Delta time frame for which I have solutions, by counting the difference in days between actual highs and lows in the data and their predicted occurrences and then creating an average error for these differences. This will be the test statistic. Using R, a Null Hypothesis average error distribution for random predictions on the same data will be created, using the same number of predicted turning points as per the Delta solution being tested. The actual average error test statistic on real data will be compared with this Null Hypothesis distribution of the average error test statistic and the Null Hypothesis rejected or not, as the case may be. The Null Hypothesis may be stated as
• given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is no better than random guessing as to where the turning points will occur.
whilst the Alternative Hypothesis may be stated as
• given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is better than could be expected from random guessing as to where the turning points will occur.
and therefore, by extension, we can accept that Delta has some (unquantifiable) predictive accuracy. For Delta to pass this test we want to be able to reject the Null Hypothesis.

All of this might be made clearer for readers by following the commented R code below.
# Assume a 365 trading day year, with 4 Delta turning points in this year
# First, create a Delta Turning points solution vector, the projected
# days on which the market will make a high or a low
proj_turns <- c(47,102,187,234) # day number of projected turning points

# now assume we apply the above Delta solution to future market data
# and identify, according to the principles of Delta, the actual turning
# points in the future unseen "real data"
real_turns <- c(42,109,193,226) # actual market turns occur on these days

# calculate the distance between the real_turns and the days on which
# the turn was predicted to occur and calculate the test statistic
# of interest, the avge_error_dist
avge_error_dist <- mean( abs(proj_turns - real_turns) )
print(avge_error_dist) # print for viewing

# calculate the theoretical probability of randomly picking 4 turning points
# in our 365 trading day year and getting an avge_error_dist that is equal
# to or better than the actual avge_error_dist calculated above.
# Taking the first projected turning point at 47 and the actual turning
# that occurs at 42, to get an error for this point that is as small as or
# smaller than that which actually occurs, we must randomly choose one of
# the following days: 42,43,44,45,46,47,48,49,50,51 or 52. The probability of
# randomly picking one of these numbers out of 1 to 365 inclusive is
a <- 11/365
# and similarly for the other 3 turning points
b <- 15/364 # turning point 2
c <- 13/363 # turning point 3
d <- 17/362 # turning point 4
# Note that the denominator decreases by 1 each time because we are
# sampling without replacement i.e. it is not possible to pick the same
# day more than once. Combining the 4 probabilities above, we get
rdn_prob_as_good <- (a*b*c*d)/100 # expressed as a %
print( rdn_prob_as_good ) # a very small % !!!

# but rather than rely on theoretical calculations, we are actually
# going to repeated, randomly choose 4 turning points and compare their
# accuracy with the "real accuracy", as measured by avge_error_dist

# Create our year vector to sample, consisting of 365 numbered days
year_vec <- 1:365

# predefine vector to hold results
result_vec <- numeric(100000) # because we are going to resample 100000 times

# count how many times a random selection of 4 turning points is as good
# as or better than our "real" results
as_good_as = 0

# do the random turning point guessing, resampling year_vec, in a loop
for(i in 1:100000) {
# randomly choose 4 days from year_vec as turning points
this_sample <- sample( year_vec , size=4 , replace=FALSE )

# sort this_sample so that it is in increasing order
sorted_sample <- sort( this_sample , decreasing=FALSE )

# calculate this_sample_avge_error_dist, our test statistic
this_sample_avge_error_dist <- mean( abs(proj_turns - sorted_sample) )

# if the test statistic is as good as or better that our real result
if( this_sample_avge_error_dist <= avge_error_dist ) {
as_good_as = as_good_as + 1 # increment as_good_as count
}

# assign this sample result to result_vec
result_vec[i] <- this_sample_avge_error_dist
}

# convert as_good_as to %
as_good_as_percent <- as_good_as/100000

# some summary statistics of result_vec
mean_of_result_vec <- mean( result_vec )
standard_dev_of_result_vec <- sd( result_vec )
real_result_from_mean <- ( mean_of_result_vec - avge_error_dist )/standard_dev_of_result_vec

print( as_good_as ) # print for viewing
print( as_good_as_percent ) # print for viewing
print( mean_of_result_vec ) # print for viewing
print( standard_dev_of_result_vec ) # print for viewing
print( real_result_from_mean ) # print for viewing

# plot histgram of the result_vec
hist( result_vec , freq=FALSE, col='yellow' )
abline( v=avge_error_dist , col='red' , lwd=3 )

Typical output of this code is
which shows a histogram of the distribution of random prediction average errors in yellow, with the actual average error shown in red. This is for the illustrative hypothetical values used in the code box above. Terminal prompt output for this is

[1] 6.5
[1] 2.088655e-08
[1] 38
[1] 0.00038
[1] 63.78108
[1] 32.33727
[1] 1.771364

where
6.5 is actual average error in days
2.088655e-08 is "theoretical" probability of Delta being this accurate
38 is number of times a random prediction is as good as or better than 6.5
0.00038 is 38 expressed as a percentage of random predictions made
63.78108 is the mean of the random distribution histogram
32.33727 is the standard deviation of the random distribution histogram
1.771364 is the difference between 63.78108 and 6.5 expressed as a multiple of the 32.33727 standard deviation.

This would be an example of the Null Hypothesis being rejected due to the 0.00038 % figure for random prediction accuracy being better than actual accuracy; in statistical parlance - a low p-value. Note, however, gross the difference between this figure and the "theoretical" figure. Also note that despite the Null being rejected the actual average error falls well within 2 standard deviations from the mean of the random distribution. This of course is due to the extremely heavy right-tailedness of the distribution, which expands the standard deviation range.

This second plot
and

[1] 77.75
[1] 2.088655e-08
[1] 48207
[1] 0.48207
[1] 79.85934
[1] 27.60137
[1] 0.07642148

shows what a typical failure to reject the Null Hypothesis would look like - a 0.48 p-value - and an actual average error that is indistinguishable from random, typified by it being well within a nice looking bell curve distribution.

So there it is, the procedure I intend to follow to objectively test the accuracy of the Delta Phenomenon.

## Friday, 27 January 2012

### Testing the Delta Phenomenon, Part 2

The main "problem" with the Delta Phenomenon is that it can be very subjective. This assertion is borne out by this quote - "We have observed over time that market participants using Delta and applying it differently do not make buy & sell decisions en-masse at the same time. Also, while the Delta order is perfect, our interpretation of the order is never 100% accurate."  which is taken from the FAQ section of the above mentioned message board. Also here are some random comments that I have cut and pasted from the above linked forum thread (views expressed are those of the original forum posters, not mine)
• ...as the delta phenomenom could be really nothing but a recurring coincidence that no one can actually use with a good accuracy rate.
• ...while there may be something in the theory, finding a practical application for it is impossible. To make sense of it everything has to be viewed in hindsight...
• I thought it was interesting, but just like drawing lines on a chart, I could find a turning point any sequence I decided to use. I think we humans are extraordinary at seeking out patterns in the world around us. Maybe not such a good thing though if there really is no pattern, just perception.
• Like most any indicator, it looks great in hindsight as you can apply each turning point to the nearest high/low. Problem is, of course, you never know this in real time, only after the fact.
• Trading with Delta is a lot like looking at an MT4 indicator that "repaints".
• Mind you, I'm not saying the concept behind Delta is not valid. Just that because of the latitude afforded on either side of the "turning point" day, plus the idea of inversions......it's just real tough to be sure until after the fact. Even the monthly newsletter that Wilder sent out with turning point calls was often subject to correction and change.....a lot of subjectivity with this method.
• Much is left to the trader's own subjective judgment.
I think readers of this blog will get the point! How can you objectively test something that is so subjective? I don't think merely making or losing money based on subjective interpretation of the points is a fair and objective test of the theory. I posted this forum question and the only answer so far is really good and unless something better is suggested on this forum, or occurs to me, this is the test I plan to implement. A more detailed discussion of this test will be the subject of my next post, but until then readers might be interested in doing some background reading on the matter via the links below
Resampling
Monte Carlo Methods
Bootstrapping
Statistical Hypothesis Testing

### Testing the Delta Phenomenon

The second book I ever bought about trading was Welles Wilder's "The Delta Phenomenon," which I bought as a result of a recommendation in the first book I ever bought. Subsequently I also bought the "The Adam Theory of Markets" and a few years later I bought the "Ocean Theory" book, so one could say I own the whole trilogy!

When I bought these I was doing my charting by hand on graph paper using prices from the Wall Street Journal, but in due course I got a computer and began using various software programs; Excel, Open Office Calc, QtStalker and finally ended up where I am today using OctaveR and Gnuplot. But however proficient I became at using these last three my programming skills weren't up to coding the Delta Phenomenon, until now that is. I had already quite easily coded the Adam Projection and the Natural Market Mirror, Natural Market River and Natural Moving Average from Ocean theory. Over the next few posts I am going to outline how I intend to test the Delta Phenomenon and show the eventual results of these tests, but before that I am going to present in this post the "breakthrough" piece of coding that finally allows me to do so. I think other users of R may find it useful.

The issue to be solved was overcoming the problem of missing days in the time series data, e.g. weekends, holidays, exchange closures, just plain missing data etc. because the Delta Phenomenon is predicated on counting days before or after specified dates, and of course any algorithmic counting over the data would be upset by such missing days. My forum query here and forum searches that turned up this led me to the R xts package, which finally resulted in me being able to write the following piece of R code
rm(list=ls(all=TRUE)) # remove all previous data from workspace
library(xts) # load the required library

# preparation for output to new file
sink(file="solution",append=FALSE,type=c("output"),split=FALSE)

# read in the raw data files

# create xts objects from above data:
x <- xts(data_1[c('V2','V3','V4','V5')],as.Date(data_1[,'V1']))
y <- xts(itdmtd[c('V2','V3','V4','V5','V6','V7','V8','V9')],as.Date(itdmtd[,'V1']))
z <- merge.xts(x,y) # merge x and y

# create a contiguous date vector to encompass date range of above data
d <- timeBasedSeq(paste(start(z),end(z),"d",sep="/"), retclass="Date")

# merge z with an "empty" xts object, xts(,d), filling with NA
prices <- merge(z,xts(,d),fill=NA)

# coerce prices xts object to a data frame object
prices_df <- data.frame(date=index(prices), coredata(prices))

# output to new file
write.table(prices_df,quote=FALSE,row.names=FALSE,col.names=FALSE,sep=",")
sink()

The code takes a csv file of the price time series, merges it with a csv file of the "Delta Solution," fills in any missing dates and then writes out the result to a final csv file that looks like this

1995-01-03,828.55,829.45,827.55,828.8,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-04,830.2,831.25,827.6,831.1,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-05,830.5,831.45,829.85,830.6,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-06,830.75,833.05,829,830.35,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-07,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-08,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA

This might not actually look like much, but using this as input to this Gnuplot script
reset
set title "Medium Term Delta Turning Points" textcolor rgb "#FFFFFF"
set object 1 rect from graph 0, 0, 0 to graph 1.1, 1.1, 0 behind lw 1.0 fc rgb "#000000" fillstyle solid 1.00
set datafile separator ","
set xdata time
set timefmt "%Y-%m-%d"
set format x
set y2range [0:1]

plot "solution" using 1:10 notitle with impulses linecolor rgb "#B0171F" axis x1y2, \
"solution" using 1:11 notitle with impulses linecolor rgb "#0000FF" axis x1y2, \
"solution" using 1:12 notitle with impulses linecolor rgb "#FFA500" axis x1y2, \
"solution" using 1:13 notitle with impulses linecolor rgb "#00EE00" axis x1y2, \
"solution" using 1:2:3:4:5 notitle with candlesticks linecolor rgb "#FFFFFF" axis x1y1, \
"solution" using 1:($$2>$$5?$$2:1/0):($$2>$$5?$$3:1/0):($$2>$$5?$$4:1/0):($$2>$$5?$$5:1/0) notitle with candlesticks lt 1 axis x1y1, \
"solution" using 1:($$2<$$5?$$2:1/0):($$2<$$5?$$3:1/0):($$2<$$5?$$4:1/0):($$2<$$5?$$5:1/0) notitle with candlesticks lt 3 axis x1y1

gives a nice plot thus

Many readers might say "So what! And...?" but to those readers who know what the Delta Phenomenon is, the coloured lines will be significant. Furthermore, using free, open source software I have created a free, as in gratis, alternative to the software that the Delta Society sells on its website. Most importantly of all, of course, is that I am now in a position to do computerised testing of the Delta Phenomenon. More on these tests in upcoming posts.

## Tuesday, 10 January 2012

### Formula for Approximating VWAP from OHLC Data

An answer to another question on the Quantitative Finance forum here gives a formula that approximates the volume weighted average price (VWAP) from OHLC data only, with an exhortation in a comment to do one's own research before using it. This blog post is my quick attempt at such research.

My simple test uses the relationship between the VWAP and the Pivot point calculated as (H+L+C)/3. This wiki entry on pivot points states "Trading above or below the pivot point indicates the overall market sentiment" and my test of being above or below is the VWAP being above or below: i.e. if today's VWAP is above today's pivot point (calculated from yesterday's H,L & C) then this is s bullish sign and therefore we can expect tomorrow's VWAP to be higher than today's. The converse is true for bearish sentiment. In Octave it is trivially simple to write a script to test whether this hypothesis is true and the terminal output of the results is shown in the box below.

The first line identifies the market being tested (a forex pair or commodity), the second "sign_correct" is the percentage of time the hypothesis appears to be true: i.e. for the first AUSCAD entry the figure 67.228 means that 67.228 % of the time the following day's VWAP is actually higher/lower than today's according to the "prediction" based on the categorisation of today's bar being bullish or bearish. The third "result" is the p-value of a 5000 repetition Monte Carlo permutation test to test the statistical significance of the given % accuracy. The permutation test was the "sign" test described in Evidence Based Technical Analysis and in the PDF available from its companion website.

Given that all the p-values are zero it can be stated that the VWAP being above or below the pivot point has a non random predictive accuracy for the next day's VWAP being higher or lower to a degree of accuracy that is statistically significant.
auscad
sign_correct =  65.695
result = 0
------------------
aususd
sign_correct =  67.228
result = 0
------------------
ausyen
sign_correct =  67.100
result = 0
------------------
bo
sign_correct =  63.989
result = 0
------------------
c
sign_correct =  63.530
result = 0
------------------
cc
sign_correct =  60.565
result = 0
------------------
cl
sign_correct =  63.318
result = 0
------------------
ct
sign_correct =  60.273
results = 0
------------------
dx
sign_correct =  63.811
result = 0
------------------
ed
sign_correct =  63.945
results = 0
------------------
euraus
sign_correct =  68.024
results = 0
------------------
sign_correct =  66.854
result = 0
------------------
eurchf
sign_correct =  65.707
result = 0
------------------
eurgbp
sign_correct =  66.760
result = 0
------------------
eurusd
sign_correct =  65.544
result = 0
------------------
euryen
sign_correct =  66.444
result = 0
------------------
fc
sign_correct =  61.905
result = 0
------------------
gbpchf
sign_correct =  67.497
result = 0
------------------
gbpusd
sign_correct =  66.936
result = 0
------------------
gbpyen
sign_correct =  66.936
result = 0
------------------
gc
sign_correct =  60.667
result = 0
------------------
hg
sign_correct =  58.554
result = 0
------------------
ho
sign_correct =  62.685
result = 0
------------------
kc
sign_correct =  61.732
result = 0
------------------
lb
sign_correct =  61.765
result = 0
------------------
lc
sign_correct =  62.372
result = 0
------------------
lh
sign_correct =  61.601
result = 0
------------------
ng
sign_correct =  62.356
result = 0
------------------
o
sign_correct =  60.705
result = 0
------------------
oj
sign_correct =  61.848
result = 0
------------------
pa
sign_correct =  62.497
result = 0
------------------
pb
sign_correct =  59.116
result = 0
------------------
pl
sign_correct =  60.737
result = 0
------------------
rb
sign_correct =  63.107
result = 0
------------------
s
sign_correct =  64.091
result = 0
------------------
sb
sign_correct =  61.106
result = 0
------------------
si
sign_correct =  59.563
result = 0
------------------
sm
sign_correct =  63.810
result = 0
------------------
sp
sign_correct =  66.954
result = 0
------------------
es
sign_correct =  66.744
result = 0
------------------
nd
sign_correct =  66.221
result = 0
------------------
ty
sign_correct =  65.260
result = 0
------------------
us
sign_correct =  65.893
result = 0
------------------
sign_correct =  67.357
result = 0
------------------
usdchf
sign_correct =  67.088
result = 0
------------------
usdyen
sign_correct =  66.947
result = 0
------------------
w
sign_correct =  65.118
result = 0

At the moment I use the value given by the pivot point calculation as my "price input" for the day but, based on these test results and the future results of some other tests I can think of, I may change to the VWAP approximation as my price input. More in a future post.

## Tuesday, 3 January 2012

### Creating Synthetic Data Using the Fast Fourier Transform

This question was recently asked on the Quantitative Finance forum and one answerer's link to here piqued my interest. My comments to said answer were
"Interesting link! I theorise that one way it could be done is to apply a Fast Fourier Transform (e.g. FFT function in Octave/MATLAB) to the original series, then divide the resulting vector into segments and within each separate segment randomly permute the values, and finally reconstruct the time series by applying the Inverse Fast Fourier Transform (IFFT function). By randomly permuting the values you will not alter the amplitudes of the composite sine waves but will slightly alter their periodicity - this will naturally tend to restrict the range of the new series to that of the original."
and
"This is such an intriguing approach that I may "knock up" some Octave code on my blog and post a link to it from here."
This blog post is my attempt to "knock up" the Octave code.

The "guts" of the code are a .oct function that randomly permutes the FFT output vector that is given to it as its only input argument and the C++ code for this .oct function is given below.
Note: due to formatting issues the headers are not correctly displayed; the octave headers should be enclosed between a "<" and ">" after the #include.
#include octave/oct.h
#include octave/dColVector.h
#include octave/CNDArray.h
#include "MersenneTwister.h"

DEFUN_DLD (permute_vector, args, , "Input is a vector that is to be permuted once")
{
octave_value_list retval;

int nargin = args.length () ;
int vec_length = args(0).length () ;

// check the input arguments
if ( nargin > 1 )
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}

if (vec_length < 2 )
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}

if (error_state)
{
error ("Invalid arguments. Input is a vector that is to be permuted once") ;
return retval ;
}
// end of input checking

ComplexNDArray input_vector = args(0).complex_array_value () ;
ComplexNDArray output_vector = args(0).complex_array_value () ;
int k1;
int k2;

MTRand mtrand1; // Declare the Mersenne Twister Class - will seed from system time

k1 = vec_length - 1; // initialise prior to shuffling the vector

while (k1 > 0) // While at least 2 left to shuffle
{
k2 = mtrand1.randInt( k1 ); // Pick an int from 0 through k1

if (k2 > k1) // check if random vector index no. k2 is > than max vector index - should never happen
{
k2 = k1 - 1; // But this is cheap insurance against disaster if it does happen
}

output_vector(k1) = input_vector(k2) ; // allocate random pick k2 from input_vector to the k1 vector index of output_vector
input_vector(k2) = input_vector(k1) ; // replace random pick k2 content of input_vector with content k1 of input_vector
k1 = k1 - 1; // count down
} // Shuffling is complete when this while loop exits

retval(0) = output_vector ;

return retval; // Return the output to Octave
}

The Octave script that calls the above function is
clear all

contract = input( "Enter contract symbol e.g. sp: ","s") ;
data = load("-ascii",contract) ;
n = rows(data)
index_begin = input( "Enter index_begin: ") ;
index_end = input( "Enter index_end, value not greater than n: ") ;

close = data(index_begin:index_end,7) ;

% detrend the close vector prior to applying the fft
slope = ( close(end) - close(1) ) / ( length(close) - 1 ) ;
v1 = (0:1:length(close)-1)' ;
detrended_close = close .- ( v1 .* slope ) ;
close_index_begin = close(1)
detrended_close_index_begin = detrended_close(1)
detrended_close_index_end = detrended_close(end)

% create zero padded vector for fft
L2 = 2^nextpow2( length(close) ) ; half_L2 = L2/2 ;
y2 = zeros( 1,L2 ) ; y2( 1:length(close) ) = detrended_close ;

% apply the fft
transform = fft( y2 ) ;

% permute the first half of the transform vector in "chunks" of 10
max_ii_value = floor( half_L2 / 10 ) ;
for ii = 1:max_ii_value
transform( (ii*10):(ii*10)+9 ) = permute_vector( transform( (ii*10):(ii*10)+9 ) ) ;
endfor

% take the inverse fft
ifft_vec = real( ifft( transform ) ) ;

% retrend the ifft_permuted_vec
retrended_ifft_vec = ifft_vec( 1:length(close) )' .+ ( v1 .* slope ) ;

% statistics
correl = corrcoef (close, retrended_ifft_vec)
spear = spearman (close, retrended_ifft_vec)
tau = kendall (close, retrended_ifft_vec)

plot( close,'b',retrended_ifft_vec,'r' ) ; legend( 'close','permute' ) ;

This script could undoubtedly be made more efficient by putting the for loop on lines 28 to 30 within the .oct function itself, but this is just a proof of concept. The script allows one to extract any contiguous section of data by use of the index_begin and index_end inputs or, of course, to select the whole data range. Some simple statistics are calculated using in-built Octave functions and can be seen in the terminal windows in the screen shots. Finally the script plots the original input data along with the FFT permuted output data. Typical screen shots are shown below, with the original data in blue and the FFT permuted data in red.

All S&P data

The last few years of the above

The same last few years indexed using the index_begin and index_end method to apply the function just to these years and not the whole data set
The close_index_begin, detrended_close_index_begin and detrended_close_index_end values also seen in the terminal windows are a simple sanity check to see that the original data has been detrended correctly prior to performing the FFT.