Dekalog Blog: R

Showing posts with label R. Show all posts

Friday, 25 March 2022

OrderBook and PositionBook Features

In my previous post I talked about how I planned to use constrained optimization to create features from Oanda's OrderBook and PositionBook data, which can be downloaded via their API. In addition to this I have also created a set of features based on the idea of Order Flow Imbalance (OFI), a nice exposition of which is given in this blog post along with a numerical example of how to calculate OFI. Of course Oanda's OrderBook/PositionBook data is not exactly the same as a conventional limit order book, but I thought they are similar enough to investigate using OFI on them. The result of these investigations is shown in the animated GIF below.

This shows the output from using the R Boruta package to check for the feature relevance of OFI levels to a depth of 20 of both the OrderBook and PositionBook to classify the sign of the log return of price over the periods detailed below following an OrderBook/PositionBook update (the granularity at which the OrderBook/PositionBook data can be updated is 20 minutes):

20 minutes
40 minutes
60 minutes
the 20 minutes starting 20 minutes in the future
the 20 minutes starting 40 minutes in the future

for both the OrderBook and PositionBook, giving a total of 10 separate images/results in the above GIF.

Observant readers may notice that in the GIF there are 42 features being checked, but only an OFI depth of 20. The reason for this is that the data contain information about buys/sell orders and long/short positions both above and below the current price, so what I did was calculate OFI for:

buy orders above price vs sell orders below price
sell orders above price vs buy orders below price
long positions above price vs short positions below price
short positions above price vs long positions below price

As can be seen, almost all features are deemed to be relevant with the exception of 3 OFI levels rejected (red candles) and 2 deemed tentative (yellow candles).

It is my intention to use these features in a machine learning model to classify the probability of future market direction over the time frames mentioned above.

Friday, 5 February 2021

A Forex Pair Snapshot Chart

After yesterday's Heatmap Plot of Forex Temporal Clustering post I thought I would consolidate all the chart types I have recently created into one easy, snapshot overview type of chart. Below is a typical example of such a chart, this being today's 10 minute EUR_USD forex pair chart up to a few hours after the London session close (the red vertical line).

The top left chart is a Market/Volume Profile Chart with added rolling Value Area upper and lower bounds (the cyan, red and white lines) and also rolling Volume Weighted Average Price with upper and lower standard deviation lines (magenta).

The bottom left chart is the turning point heatmap chart as described in yesterday's post.

The two rightmost charts are also Market/Volume Profile charts, but of my Currency Strength Candlestick Charts based on my Currency Strength Indicator. The upper one is the base currency, i.e. EUR, and the lower is the quote currency.

The following charts are the same day's charts for:

GBP_USD,

USD_CHF

and finally USD_JPY

The regularity of the turning points is easily seen in the lower lefthand charts although, of course, this is to be expected as they all share the USD as a common currency. However, there are also subtle differences to be seen in the "shadows" of the lighter areas.

For the nearest future my self-assigned task will be to observe the forex pairs, in real time, through the prism of the above style of chart and do some mental paper trading, and perhaps some really small size, discretionary live trading, in additional to my normal routine of research and development.

Thursday, 4 February 2021

Heatmap Plot of Forex Temporal Clustering of Turning Points

Following up on my previous post, below is the chart of the temporal turning points that I have come up with.

This particular example happens to be 10 minute candlesticks over the last two days of the GBP_USD forex pair.

The details I have given about various turning points over the course of my last few posts have been based on identifying the "ix" centre value of turning point clusters. However, for plotting purposes I felt that just displaying these ix values wouldn't be very illuminating. Instead, I have taken the approach of displaying a sort of distribution of turning points per cluster. I would refer readers to my temporal clustering part 3 post wherein there is a coloured histogram of the R output of the clustering algorithm used. What I have done for the heatmap background of the above chart is normalise each separate, coloured histogram by the maximum value within the cluster and then plotted these normalised cluster values using Octave's pcolor function. An extra step taken was to raise the values to the power four just to increase the contrast within and between the sequential histogram backgrounds.

Each normalised histogram has a single value of one, which is shown by the bright yellow vertical lines, one per cluster. This represents the time of day at which, within the cluster window, the greatest number of turns occured in the historical lookback period. The darker green lines show other times within the cluster at which other turns occured.

The hypothesis behind this is that there are certain times of the day when price is more likely to change direction, a turning point, than at other times. Such times are market opens, closes etc. and the above chart is a convenient visual representation of these times. The lighter the backgound, the greater the probability that such a turn will occur, based upon the historical record of such turn timings.

Enjoy!

Sunday, 3 May 2020

Market Profile Chart in Octave

In a comment on my previous post, visualising Oanda's orderbook, a reader called Darren suggested that I was over complicating things and should perhaps use a more established methodology, namely Market Profile.

I had heard of Market Profile before Darren mentioned it, but had always assumed that it required access to data that I didn't readily have to hand, i.e. tick level data. With my recent work on Oanda's API in Octave that is no longer necessarily the case. However, downloading, storing and manipulating streams of tick data would be a whole new infrastructure project that I would have to implement in either R or Octave.

Instead of doing this I have done some research into Market Profile and come up with an alternative solution that can use the more readily available tick volume. One of the empirically observed assumptions of Market Profile is that on a "normal" day such volume is normally distributed and creates a "value area" that contains approximately 70% of the market action, which roughly corresponds to action falling within one standard deviation of the mean of said action, and this mean in turn roughly corresponds with what is termed the "point of control" (POC).

If one takes this at face value as being an accurate description of market action, it is possible to recreate the "normal" market profile with the following Octave code:

 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 [ vals , bin_centres ] = hist( ticks , unique( ticks ) ) ;

What this does is create vol(ii)+2 linearly spaced tick values from 0 to 1, where vol(ii) is the tick volume for an aggregated period, i.e. an ohlc bar, and transforms these into normally distributed ticks with a mean of the midpoint of the bar and an assumed standard deviation of one sixth the high-low range, rounded to the nearest whole tick. The hist function then provides the counts of ticks per level (vals) at levels (bin_centres).

Below is a screen shot of recent EUR_USD forex prices at a resolution of 20 minute candlesticks from 17:00 EST on 28th April 2020 to end of week close at 17:00 EST 1st May 2020.

The silhouette chart at the bottom is the usual tick volume per bar and the horizontal histogram is the Market Profile of the 20 minute bars from the first bar to the first vertical green line, calculated as described above. All the visable vertical green lines represent the open at 07:00 BST, whilst the vertical red lines are the 17:00 EST closes. The horizontal blue line is the current POC at 07:00 BST, taking into account only the bars to the left of the first green line, i.e. the Asian overnight session.

Next is a video of the progression through time along the above chart: as time progesses the Market Profile histogram changes and new, blue POC lines are plotted, with the time progression being marked by the advancing green lines. During subsequent Asian sessions the histogram colour is plotted in red, and new POC lines formed in the Asian session are also plotted in red.

For easier viewing, this is a screen shot of the chart as it appears at the end of the video

For comparative purposes this is a screen shot of the same as above, but using 10 minute ohlc bars and 10 minute updates to the Market Profile

Readers should note that the scaling of the silhouette charts and histograms are not the same for both - they are hand scaled by me for visualisation purposes only.

For completeness, here is the Octave script used to produce the above

clear all ;
pkg load statistics ;

## load data
cd /home/dekalog/Documents/octave/oanda_data/20m ;
oanda_files_20m = glob( "*_ohlc_20m" ) ;
ix = 7 ;##input( 'Tradable? ' ) ;
data = dlmread( oanda_files_20m{ ix } ) ;
data( 1 : 146835 , : ) = [] ;

tick_size = 0.0001 ;

open = data( : , 18 ) ; high = data( : , 19 ) ; low = data( : , 20 ) ; close = data( : , 21 ) ; vol = data( : , 22 ) ;
## Create grid for day
max_high = max( high ) + 0.001 ; min_low = min( low ) - 0.001 ; grid = ( min_low : tick_size : max_high + 0.0001 ) ;
grid_ix = floor( grid ./ tick_size .+ tick_size ) .* tick_size ; 
market_profile = [ grid_ix ; zeros( 1 , size( grid_ix , 2 ) ) ] ;
asian_market_profile = [ grid_ix ; zeros( 1 , size( grid_ix , 2 ) ) ] ;
 
figure( 20 ) ; 
candle( high , low , close , open ) ; 
vline( 27 , 'g' ) ; vline( 72 , 'r' ) ; vline( 99 , 'g' ) ; vline( 144 , 'r' ) ; vline( 174 , 'g' ) ;
xlim( [ 0 size( open , 1 ) ] ) ;
ylim( [ grid_ix(1) grid_ix(end) ] ) ;
hold on ; plot( ( vol .* 0.0000004 ) .+ grid_ix( 1 ) , 'b' , 'linewidth' , 2 ) ; 
area( ( vol .* 0.0000004 ) .+ grid_ix( 1 ) , 'facecolor' , 'b' ) ; hold off ;

for ii = 1 : 27
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 [ vals , bin_centres ] = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , bin_centres ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
endfor

[ max_mp_val_old , max_mp_ix ] = max( market_profile( 2 , : ) ) ;

hold on ; figure( 20 ) ; H = barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ; hold off ;
hline( market_profile( 1 , max_mp_ix ) , 'b' ) ;

for ii = 28 : 72
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 vals = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , unique( ticks ) ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
 [ max_mp_val , max_mp_ix ] = max( market_profile( 2 , : ) ) ;
 hold on ; figure( 20 ) ; barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ; hold off ;
 vline( ii , 'g' ) ; 
 if ( max_mp_val > max_mp_val_old )
  hline( market_profile( 1 , max_mp_ix ) , 'b' ) ;
  max_mp_val_old = max_mp_val ;
 endif
 pause(0.01) ;
endfor

for ii = 73 : 99
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 vals = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , unique( ticks ) ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
 asian_market_profile( 2 , vals_ix ) += vals ;
 [ max_mp_val , max_mp_ix ] = max( market_profile( 2 , : ) ) ;
 hold on ; figure( 20 ) ; barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ; 
 figure( 20 ) ; barh( asian_market_profile( 1 , : ) , asian_market_profile( 2 , : ).*0.005 , 'r' ) ;
 hold off ;
 vline( ii , 'g' ) ; 
 if ( max_mp_val > max_mp_val_old )
  hline( market_profile( 1 , max_mp_ix ) , 'b' ) ;
  max_mp_val_old = max_mp_val ;
 endif
 pause(0.01) ;
endfor

[ ~ , max_mp_ix ] = max( asian_market_profile( 2 , : ) ) ;
hline( asian_market_profile( 1 , max_mp_ix ) , 'r' ) ;

for ii = 100 : 144
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 vals = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , unique( ticks ) ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
 [ max_mp_val , max_mp_ix ] = max( market_profile( 2 , : ) ) ;
 hold on ; figure( 20 ) ; barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ;
 figure( 20 ) ; barh( asian_market_profile( 1 , : ) , asian_market_profile( 2 , : ).*0.005 , 'r' ) ; 
 hold off ;
 vline( ii , 'g' ) ; 
 if ( max_mp_val > max_mp_val_old )
  hline( market_profile( 1 , max_mp_ix ) , 'b' ) ;
  max_mp_val_old = max_mp_val ;
 endif
 pause(0.01) ;
endfor

[ max_mp_val , max_mp_ix ] = max( market_profile( 2 , 101 : end ) ) ;
max_mp_val_old = max_mp_val ;
hline( market_profile( 1 , max_mp_ix + 100 ) , 'b' ) ;

for ii = 145 : 174
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 vals = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , unique( ticks ) ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
 asian_market_profile( 2 , vals_ix ) += vals ;
 [ max_mp_val , max_mp_ix ] = max( market_profile( 2 , 101 : end ) ) ;
 hold on ; figure( 20 ) ; barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ; 
 figure( 20 ) ; barh( asian_market_profile( 1 , : ) , asian_market_profile( 2 , : ).*0.005 , 'r' ) ;
 hold off ;
 vline( ii , 'g' ) ; 
 if ( max_mp_val > max_mp_val_old )
  hline( market_profile( 1 , max_mp_ix + 100 ) , 'b' ) ;
  max_mp_val_old = max_mp_val ;
 endif
 pause(0.01) ;
endfor

[ ~ , max_mp_ix ] = max( asian_market_profile( 2 , 101 : end ) ) ;
hline( asian_market_profile( 1 , max_mp_ix + 100 ) , 'r' ) ;

for ii = 175 : size( open , 1 )
 ticks = norminv( linspace( 0 , 1 , vol( ii ) + 2 ) , ( high( ii ) + low( ii ) ) / 2 , ( high( ii ) - low( ii ) ) / 6 ) ;
 ticks = floor( ticks( 2 : end - 1 ) ./ tick_size .+ tick_size ) .* tick_size ;
 vals = hist( ticks , unique( ticks ) ) ;
 vals_ix = find( ismember( grid_ix , unique( ticks ) ) ) ;
 market_profile( 2 , vals_ix ) += vals ;
 [ max_mp_val , max_mp_ix ] = max( market_profile( 2 , 101 : end ) ) ;
 hold on ; figure( 20 ) ; barh( market_profile( 1 , : ) , market_profile( 2 , : ).*0.005 , 'c' ) ; 
 figure( 20 ) ; barh( asian_market_profile( 1 , : ) , asian_market_profile( 2 , : ).*0.005 , 'r' ) ;
 hold off ;
 vline( ii , 'g' ) ; 
 if ( max_mp_val > max_mp_val_old )
  hline( market_profile( 1 , max_mp_ix + 100 ) , 'b' ) ;
  max_mp_val_old = max_mp_val ;
 endif
 pause(0.01) ;
endfor

As just noted above for the scaling of the charts/video, readers should also be aware that within this script there are a lot of magic numbers that are unique to the data and scaling being used; therefore, this is not a plug in and play video script.

My thanks to the reader Darren, who suggested that I look into Market Profile. More in due course.

Tuesday, 1 October 2019

Ideal Cyclic Tau Embedding as Times Series Features

Continuing on from my Ideal Tau for Time Series Embedding post, I have now written an Octave function based on these ideas to produce features for time series modelling. The function outputs are two slightly different versions of features, examples of which are shown in the following two plots, which show up and down trends in black, following a sinusoidal sideways market partially visible to the left:

and

The function outputs are normalised price and price delayed by a quarter and a half of the cycle period (in this case 20.) The trend slopes have been chosen to exemplify the difference in the features between up and down trends. The function outputs are identical for the unseen cyclic prices to the left.

When the sets of function outputs are plotted as 3D phase space plots typical results are

with the green, blue and red phase trajectories corresponding to the cyclic, up trending and down trending portions of the above synthetic price series. The markers in this plot correspond to points in the phase trajectories at which turns in the underlying price series occur. The following plot is the above plot rotated in phase space such that the green cyclic price phase trajectory is horizontally orientated.

It is clearer in this second phase space plot that there is a qualitative difference in the phase space trajectories of the different, underlying market types.

As a final check of feature relevance I used the Boruta R package, with the above turning point markers as classification targets (a turning point vs. not a turning point,) to assess the utility of this approach in general. These tests were conducted on real price series and also on indices created by my currency strength methodology. In all such Boruta tests the features are deemed "relevant" up to a delay of five bars on daily data (I ceased the tests at the five bar mark because it was becoming tedious to carry on.)

In summary, therefore, it can be concluded that features derived using Taken's theorem are useful on financial time series provided that:

the underlying time series are normalised (detrended) and,
the embedding delay (Tau) is set to the theoretical optimum of a quarter (and multiples thereof) of the measured cyclic period of the time series.

Tuesday, 31 October 2017

Prepending Historical Data with Oanda's R API

As a follow on to my previous post, which was about appending data, the script below prepends historical data to an assumed existing data record.

% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data/hourly")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {  

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   current_ohlc_record = read.table( file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , header = FALSE , na = "" , sep = "," ,
                                      stringsAsFactors = FALSE )
   
   current_ohlc_record_begin_date_time = as.character( current_ohlc_record[ 1 , 1 ] ) % get the date/time value to be matched
   last_date_ix = as.Date( current_ohlc_record[ 1 , 1 ] )                             % the end date for new data to be downloaded             
   
   % last 40 weeks of hourly data approx = 5000 hourly bars            
   begin_date_ix = as.Date( last_date_ix - 280 )                      % the begin date for new data to be downloaded

   % download the missing historical data from begin_date_ix to last_date_x.
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          begin_date_ix , last_date_ix + 2 ) % +2 to ensure that the end of the new downloaded data will
                                                             % overlap with the beginning of current_ohlc_record
  
   % having ensured no data is missed by overlaping with the current_ohlc_record, delete duplicated OHLC information
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value
   
   ix = charmatch( current_ohlc_record_begin_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( ix : nrow( new_historical_data ) ) , ]
   
   % before prepending new_historical_data in front of current_ohlc_record, need to give names to current_ohlc_record as
   % rbind needs to bind by named attributes
   names( current_ohlc_record ) = names( new_historical_data )
   
   % see https://stackoverflow.com/questions/11785710/rbind-function-changing-my-entries for reason for following
   % also need to coerce that dates in new_historical_data from POSIXct to character
   new_historical_data$TimeStamp = as.character( new_historical_data$TimeStamp )
   
   % and now prepend new_historical_data to current_ohlc_record
   combined_records = rbind( new_historical_data , current_ohlc_record , stringsAsFactors = FALSE )
   
   % and coerce character dates back to a POSIXct date format prior to printing
   combined_records$TimeStamp = as.POSIXct( combined_records$TimeStamp )
   
   % write combined_records to file
   write.table( combined_records , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , 
                col.names = FALSE , sep = "," )
   
   added_data_length = nrow( new_historical_data ) % length of added new data

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
   
   % write updated Instrument_update_file to file
   write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE )

} % end of for all_current_historical_data_list loop

As described in the previous post the function HisPricesDates is called to do the actual downloading, with the relevant dates for function input being read and calculated from the existing data file ( I have hard coded for hourly data but this can, of course, be changed or implemented as user input in the R session). As usual I have commented the script to explain what is going on.

However, one important caveat is that it is assumed that there is actually some Oanda data to download and prepend prior the the earliest date in the existing record, and there are no checks of this assumption. Therefore, the script might fail in unexpected ways if one attempts to reach too far back in history for the prependable data.

Friday, 27 October 2017

Updating Historical Data Using Oanda's API and R

Following on from my previous post about downloading historical data, this post shows how previously downloaded data may be updated and appended with new, more recent data without having to re-download all the old data all over again.

The main function to do this, HisPricesDates, downloads data between given dates as function inputs and is shown below.

HisPricesDates  = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Start, End ){

% a typical Oanda API call might look like  
% https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD&granularity=D&start=2014-03-21&end=2014-04-21&candleFormat=midpoint&includeFirst=false
% which is slowly built up by using the R paste function, commented at end of each line below
  
  httpaccount  = "https://api-fxtrade.oanda.com"
  auth           = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec  = paste(httpaccount,"/v1/candles?instrument=",sep="") % https://api-fxtrade.oanda.com/v1/candles?instrument=
  QueryHistPrec1 = paste(QueryHistPrec,Instrument,sep="")              % https://api-fxtrade.oanda.com/v1/candles?instrument=EUR_USD
  qstart = paste("start=",Start,sep="")                                % start=2014-03-21
  qend   = paste("end=",End,sep="")                                    % end=2014-04-21   
  qcandleFormat  = "candleFormat=midpoint"                             % candleFormat=midpoint
  qgranularity   = paste("granularity=",Granularity,sep="")            % granularity=D
  qdailyalignment    = paste("dailyAlignment=",DayAlign,sep="")        % dailyAlignment=0
  qincludeFirst = "includeFirst=false"                                 % includeFirst=false
  QueryHistPrec2 = paste(QueryHistPrec1,qgranularity,qstart,qend,qcandleFormat,qincludeFirst,qdailyalignment,sep="&")
  InstHistP = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices        = data.frame(InstHistPjson[[3]])
  Prices$time   = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}

This function is called by two R scripts, one for downloading daily data and one for intraday data.

The daily update script, which is shown next,

% cd to the daily data directory
setwd("~/Documents/octave/oanda_data/daily")

all_current_historical_data_list = read.table("instrument_daily_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {
  
  instrument = all_current_historical_data_list[ ii , 1 ]
  % read second column of dates in all_current_historical_data_list as a date index
  date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
  todays_date = as.Date( Sys.time() )
  
  % download the missing historical data from date_ix to todays_date, if and only if, date_ix != todays_date
  if( date_ix + 1 != todays_date ) {
  
  new_historical_data = HisPricesDates( Granularity = "D", DayAlign, TimeAlign, AccountToken, instrument,
                         date_ix , todays_date )

  % the new_historical_data might only try to add incomplete OHLC data, in which case do not actually
  % want to update, so only update if we will be adding new, complete OHLC information  
  if ( nrow( new_historical_data ) >= 2 & new_historical_data[ 2 , 7 ] == TRUE ) {
  
    % now do some data manipulation
    % expect date of last line in Instrument_update_file == date of first line in new_historical_data 
    if ( date_ix == as.Date( new_historical_data[ 1 , 1 ] ) ) { % this is the case if true
       new_historical_data = new_historical_data[ -1 , ]       % so delete first row of new_historical_data
    }
  
    % similarly, expect last line of new_historical_data to be an incomplete OHLC bar
    if ( new_historical_data[ nrow( new_historical_data) , 7 ] == FALSE) {         % if so,
       new_historical_data = new_historical_data[ -nrow( new_historical_data) , ] % delete this last line
    }
    
    % append new_historical_data to the relevant raw data file
    write.table( new_historical_data , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )
  
    added_data_length = nrow( new_historical_data )
    new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )
    
    % and amend Instrument_update file with lastest update information  
    all_current_historical_data_list[ ii , 2 ] = new_last_date
    all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length
  
    } % end of download if statement
  
  } % end of ( date_ix != todays_date ) if statement
  
} % end of for all_current_historical_data_list loop

% Write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_daily_update_file" , row.names = FALSE , col.names = FALSE , na = "" )

has if statements as control structures to check that there is likely to be new daily data to actually download. It does this by checking a last_update date contained in an "instrument_daily_update_file" and comparing this with the current OS system time. If there is likely to be new data, the script runs and then updates this "instrument_daily_update_file." If not, the script exits with nothing having been done.

The intraday update script doe not have the checks the daily script has because I assume there will always be some new intraday data available for download. In this case, the last_update date is read from the "instrument_update_file" purely to act as an input to the above HisPricesDates function. As a result, this script involves some data manipulation to ensure that duplicate data is not printed to file. This script is shown next and is heavily commented to explain what is happening.

% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data")

all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") )

for( ii in 1 : nrow( all_current_historical_data_list ) ) {

   instrument = all_current_historical_data_list[ ii , 1 ]
   
   % read second column of dates in all_current_historical_data_list as a date index
   date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] )
   
   todays_date = as.Date( Sys.time() )

   % download the missing historical data from date_ix to todays_date. If date_ix == todays_date, will download all
   % hourly bars for today only. 
   new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
                          date_ix , todays_date + 1 )

   % the new_historical_data will almost certainly have incomplete hourly OHLC data in its last line,
   % so delete this incomplete OHLC information
   if ( new_historical_data[ nrow( new_historical_data ) , 7 ] == FALSE ) {
        new_historical_data = new_historical_data[ -nrow( new_historical_data ) , ]
   }
   
   % read the last line only of the current OHLC file for this instrument
   file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) % get the filename
   
   system_command = paste( "tail -1" , file , sep = " " )     % create a unix system command to read the last line of this file
   
   % read the file's last line
   old_historical_data = read.csv( textConnection( system( system_command , intern = TRUE ) ) , header = FALSE , sep = "," ,
                                    stringsAsFactors = FALSE )
   
   old_historical_data_end_date_time = old_historical_data[ 1 , 1 ]            % get the date value to be matched 
   
   new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value

   ix = charmatch( old_historical_data_end_date_time , new_historical_data_date_times ) % get the matching index value

   % delete that part of new_historical_data which is already contained in filename
   new_historical_data = new_historical_data[ -( 1 : ix ) , ]
   
   % append new_historical_data to the relevant raw data file
   write.table( new_historical_data , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , na = "" ,
                 col.names = FALSE , sep = "," , append = TRUE )

   added_data_length = nrow( new_historical_data )                          % length of added new data
   new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] )  % date of last update

   % and amend Instrument_update file with lastest update information
   all_current_historical_data_list[ ii , 2 ] = new_last_date
   all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length

} % end of for all_current_historical_data_list loop

% finally, write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE , na = "" )

There is one important thing to point out on lines 29 to 33, which is that this section of code relies on a Unix based command, which in turn means that this almost certainly will not work on Windows based OSes. Windows users will have to find their own hack to load just the last line of the relevant file, or put up with loading the whole historical data file and indexing just the last line.

Thursday, 21 September 2017

Downloading Historical Data Using Oanda's API and R

It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data.

Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position.

My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio.

The first such R script is called to initialise the variables and load the various functions into the R environment, shown below.

# Required Packages in order to use the R API functions
library("downloader")
library("RCurl")
library("jsonlite")
library("httr")

# -- ---------------------------------------------------------------------------------- #
# -- Specific Values of Parameters in API for my primary account ---------------------- #
# -- ---------------------------------------------------------------------------------- #

# -- numeric ------ # My Account ID 
AccountID = 123456     
# -- character ---- # My Oanda Token
AccountToken = "blah-de-blah-de-blah"

# load the various function files
source("~/path/to/InstrumentsList.R")
source("~/path/to/R_Oanda_API_functions/ActualPrice.R")
source("~/path/to/HisPricesCount.R")
source("~/path/to/HisPricesDates.R")

# get list of all tradeable instruments
trade_instruments = InstrumentsList( AccountToken , AccountID )
View( trade_instruments )

# load some default values
# -- character ---- # Granularity of the prices
# granularity: The time range represented by each candlestick. The value specified will determine the alignment 
# of the first candlestick.
# choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month )
Granularity = "D"

# -- numeric ------ # Hour of the "End of the Day"
# dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. 
# The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be 
# an integer between 0 and 23.
DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York"

# -- character ---- # Time Zone in format "Continent/Zone
# alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the 
# returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by 
# the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at 
# http://developer.oanda.com/docs/timezones.txt
# "America%2FMexico_City" was the originallly provided, but could use, for example,  "Europe%2FLondon" or "Europe%2FWarsaw"
TimeAlign = "Europe%2FLondon"

################################# IMPORTANT NOTE #####################################################################
# By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures
# that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is 
# Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer
# daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT.
######################################################################################################################

Start = Sys.time() # Current system time
End = Sys.time() # Current system time
Count = 500 # Oanda default

# now cd to the working directory
setwd("~/path/to/oanda_data")

The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next.

InstrumentsList = function( AccountToken , AccountID )
{
  httpaccount = "https://api-fxtrade.oanda.com"
  auth        = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  Queryhttp   = paste(httpaccount,"/v1/instruments?accountId=",sep="")
  QueryInst   = paste(Queryhttp,AccountID,sep="")
  QueryInst1  = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstJson    = fromJSON(QueryInst1, simplifyDataFrame = TRUE)
  FinalData   = data.frame(InstJson)
  colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits")
  FinalData$MaxTradeUnits = as.numeric(FinalData$MaxTradeUnits)
  return(FinalData)
}

This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also.

# cd to the working directory
setwd("~/path/to/oanda_data")

# dataframe to keep track of updates
Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE )

for( ii in 1 : nrow( trade_instruments ) ) {
  
  instrument = trade_instruments[ ii , 1 ]
  
  # write details of instrument to Instrument_update_file
  Instrument_update_file[ ii , 1 ] = instrument
  
  historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 )
  last_row_ix = nrow( historical_prices )
  
  if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete
    # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data
    # to Instrument_update_file
    Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] )
    
    # and delete the row with these incomplete values
    historical_prices = historical_prices[ 1 : last_row_ix - 1 , ]
    
  } # end of if statement
  
  # Write historical_prices to file
  write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , 
             col.names = FALSE , sep = "," )

  } # end of for loop

  # Write Instrument_update_file to file
  write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," )

This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is

HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){
  
  httpaccount      = "https://api-fxtrade.oanda.com"
  auth             = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec    = paste(httpaccount,"/v1/candles?instrument=",sep="")
  QueryHistPrec1   = paste(QueryHistPrec,Instrument,sep="")
  qcount           = paste("count=",Count,sep="")
  qcandleFormat    = "candleFormat=midpoint"
  qgranularity     = paste("granularity=",Granularity,sep="")
  qdailyalignment  = paste("dailyAlignment=",DayAlign,sep="")
  QueryHistPrec2   = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&")
  InstHistP        = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson    = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices           = data.frame(InstHistPjson[[3]])
  Prices$time      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}

One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.

Thursday, 15 September 2016

Loading and Manipulating Historical Data From .csv Files

In my last post I said I was going to look at data wrangling my data, and this post outlines what I have done since then.

My problem was that I have numerous csv files containing historical data with different date formats and frequency, e.g. tick level and hourly and daily OHLC, and in the past I have always struggled with this. However, I have finally found a solution using the R quantmod package, which makes it easy to change data into a lower frequency. It took me some time to finally get what I wanted but the code box below shows the relevant R code to convert hourly OHLC, contained in one .csv file, to daily OHLC which is then written to a new .csv file.

library("quantmod", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")
price_data = read.csv( "path/to/file.csv" , header = FALSE )
price_data = xts( price_data[,2:6] , order.by = as.Date.POSIXlt( strptime( price_data[,1] , format = "%d/%m/%y %H:%M" , tz = "" ) ) )
price_data_daily = to.daily( price_data , drop.time = TRUE )
write.zoo( price_data_daily , file = "path/to/new/file.csv" , sep = "," , row.names = FALSE , col.names = FALSE )

To finally achieve such a small snippet of working code I can't believe how much time I had to spend reading documentation and looking online.

This next code box shows Octave code to load the above written .csv file into Octave

fid = fopen( 'path/to/file' , 'rt' ) ;
data = textscan( fid , '%s %f %f %f %f' , 'Delimiter' , ',' , 'CollectOutput', 1 ) ;
fclose( fid ) ;
eurusd = [ datenum( data{1} , 'yyyy-mm-dd' ) data{2} ] ;
clear data fid

Hopefully, in both cases, manipulating the format strings "%d/%m/%y %H:%M" and 'yyyy-mm-dd' in these two respective code snippets will save you the hours I spent.

Useful links that helped me are:

Thursday, 10 September 2015

Recent reading

In my last post I mentioned that I was going away for the summer, but now I'm back. During the summer I didn't get to do as much reading etc. as I had hoped, but I did manage to play around with the Rssa package for singular spectrum analysis and this is still an ongoing investigation. I also briefly looked at independent component analysis and the FastICA package.

The common theme of the above is the extraction of meaningful time series features, and this general area is what I will be looking into for my next set of posts.

Monday, 30 January 2012

Testing the Delta Phenomenon, Part 3

Readers may recall from my recent post that the "problem" with the Delta Phenomenon is that it is very subjective and therefore difficult to test objectively. This post will outline how I intend to conduct such an objective test, using R, and discuss some issues related to the test.

Firstly, I am going to use the statistical concept of Cross-validation, whereby a model is trained on one set of data and tested for accuracy on another set of data, which is actually quite common in the world of back testing. Since the "Delta solutions" I will be testing come from late 2006 (See Testing the Delta Phenomenon, Part 2) the 4 years from 2008 to 2011 inclusive can be considered to be the validation set for this test. The test(s) will assess the accuracy of the predicted highs and lows for this period, on each Delta time frame for which I have solutions, by counting the difference in days between actual highs and lows in the data and their predicted occurrences and then creating an average error for these differences. This will be the test statistic. Using R, a Null Hypothesis average error distribution for random predictions on the same data will be created, using the same number of predicted turning points as per the Delta solution being tested. The actual average error test statistic on real data will be compared with this Null Hypothesis distribution of the average error test statistic and the Null Hypothesis rejected or not, as the case may be. The Null Hypothesis may be stated as

given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is no better than random guessing as to where the turning points will occur.

whilst the Alternative Hypothesis may be stated as

given a postulated number of turning points the accuracy of the Delta Phenomenon in correctly predicting when these turning points will occur, using the average error test statistic described above as the measure of accuracy, is better than could be expected from random guessing as to where the turning points will occur.

and therefore, by extension, we can accept that Delta has some (unquantifiable) predictive accuracy. For Delta to pass this test we want to be able to reject the Null Hypothesis.

All of this might be made clearer for readers by following the commented R code below.

# Assume a 365 trading day year, with 4 Delta turning points in this year
# First, create a Delta Turning points solution vector, the projected
# days on which the market will make a high or a low 
proj_turns <- c(47,102,187,234) # day number of projected turning points

# now assume we apply the above Delta solution to future market data
# and identify, according to the principles of Delta, the actual turning 
# points in the future unseen "real data"
real_turns <- c(42,109,193,226) # actual market turns occur on these days

# calculate the distance between the real_turns and the days on which 
# the turn was predicted to occur and calculate the test statistic 
# of interest, the avge_error_dist
avge_error_dist <- mean( abs(proj_turns - real_turns) )
print(avge_error_dist) # print for viewing

# calculate the theoretical probability of randomly picking 4 turning points
# in our 365 trading day year and getting an avge_error_dist that is equal 
# to or better than the actual avge_error_dist calculated above.
# Taking the first projected turning point at 47 and the actual turning 
# that occurs at 42, to get an error for this point that is as small as or
# smaller than that which actually occurs, we must randomly choose one of 
# the following days: 42,43,44,45,46,47,48,49,50,51 or 52. The probability of
# randomly picking one of these numbers out of 1 to 365 inclusive is
a <- 11/365
# and similarly for the other 3 turning points
b <- 15/364 # turning point 2
c <- 13/363 # turning point 3
d <- 17/362 # turning point 4
# Note that the denominator decreases by 1 each time because we are
# sampling without replacement i.e. it is not possible to pick the same
# day more than once. Combining the 4 probabilities above, we get
rdn_prob_as_good <- (a*b*c*d)/100 # expressed as a %
print( rdn_prob_as_good ) # a very small % !!!

# but rather than rely on theoretical calculations, we are actually
# going to repeated, randomly choose 4 turning points and compare their 
# accuracy with the "real accuracy", as measured by avge_error_dist

# Create our year vector to sample, consisting of 365 numbered days
year_vec <- 1:365

# predefine vector to hold results
result_vec <- numeric(100000) # because we are going to resample 100000 times

# count how many times a random selection of 4 turning points is as good
# as or better than our "real" results
as_good_as = 0

# do the random turning point guessing, resampling year_vec, in a loop
for(i in 1:100000) {
  # randomly choose 4 days from year_vec as turning points
  this_sample <- sample( year_vec , size=4 , replace=FALSE )
  
  # sort this_sample so that it is in increasing order
  sorted_sample <- sort( this_sample , decreasing=FALSE )
  
  # calculate this_sample_avge_error_dist, our test statistic
  this_sample_avge_error_dist <- mean( abs(proj_turns - sorted_sample) )
  
   # if the test statistic is as good as or better that our real result
   if( this_sample_avge_error_dist <= avge_error_dist ) {
     as_good_as = as_good_as + 1 # increment as_good_as count
   }
  
  # assign this sample result to result_vec
  result_vec[i] <- this_sample_avge_error_dist
}

# convert as_good_as to %
as_good_as_percent <- as_good_as/100000

# some summary statistics of result_vec
mean_of_result_vec <- mean( result_vec )
standard_dev_of_result_vec <- sd( result_vec )
real_result_from_mean <- ( mean_of_result_vec - avge_error_dist )/standard_dev_of_result_vec

print( as_good_as ) # print for viewing
print( as_good_as_percent ) # print for viewing
print( mean_of_result_vec ) # print for viewing
print( standard_dev_of_result_vec ) # print for viewing
print( real_result_from_mean ) # print for viewing

# plot histgram of the result_vec
hist( result_vec , freq=FALSE, col='yellow' )
abline( v=avge_error_dist , col='red' , lwd=3 )

Typical output of this code is

which shows a histogram of the distribution of random prediction average errors in yellow, with the actual average error shown in red. This is for the illustrative hypothetical values used in the code box above. Terminal prompt output for this is

[1] 6.5
[1] 2.088655e-08
[1] 38
[1] 0.00038
[1] 63.78108
[1] 32.33727
[1] 1.771364

where
6.5 is actual average error in days
2.088655e-08 is "theoretical" probability of Delta being this accurate
38 is number of times a random prediction is as good as or better than 6.5
0.00038 is 38 expressed as a percentage of random predictions made
63.78108 is the mean of the random distribution histogram
32.33727 is the standard deviation of the random distribution histogram
1.771364 is the difference between 63.78108 and 6.5 expressed as a multiple of the 32.33727 standard deviation.

This would be an example of the Null Hypothesis being rejected due to the 0.00038 % figure for random prediction accuracy being better than actual accuracy; in statistical parlance - a low p-value. Note, however, gross the difference between this figure and the "theoretical" figure. Also note that despite the Null being rejected the actual average error falls well within 2 standard deviations from the mean of the random distribution. This of course is due to the extremely heavy right-tailedness of the distribution, which expands the standard deviation range.

This second plot

and

[1] 77.75
[1] 2.088655e-08
[1] 48207
[1] 0.48207
[1] 79.85934
[1] 27.60137
[1] 0.07642148

shows what a typical failure to reject the Null Hypothesis would look like - a 0.48 p-value - and an actual average error that is indistinguishable from random, typified by it being well within a nice looking bell curve distribution.

So there it is, the procedure I intend to follow to objectively test the accuracy of the Delta Phenomenon.

Friday, 27 January 2012

Testing the Delta Phenomenon

The second book I ever bought about trading was Welles Wilder's "The Delta Phenomenon," which I bought as a result of a recommendation in the first book I ever bought. Subsequently I also bought the "The Adam Theory of Markets" and a few years later I bought the "Ocean Theory" book, so one could say I own the whole trilogy!

When I bought these I was doing my charting by hand on graph paper using prices from the Wall Street Journal, but in due course I got a computer and began using various software programs; Excel, Open Office Calc, QtStalker and finally ended up where I am today using Octave, R and Gnuplot. But however proficient I became at using these last three my programming skills weren't up to coding the Delta Phenomenon, until now that is. I had already quite easily coded the Adam Projection and the Natural Market Mirror, Natural Market River and Natural Moving Average from Ocean theory. Over the next few posts I am going to outline how I intend to test the Delta Phenomenon and show the eventual results of these tests, but before that I am going to present in this post the "breakthrough" piece of coding that finally allows me to do so. I think other users of R may find it useful.

The issue to be solved was overcoming the problem of missing days in the time series data, e.g. weekends, holidays, exchange closures, just plain missing data etc. because the Delta Phenomenon is predicated on counting days before or after specified dates, and of course any algorithmic counting over the data would be upset by such missing days. My forum query here and forum searches that turned up this led me to the R xts package, which finally resulted in me being able to write the following piece of R code

rm(list=ls(all=TRUE)) # remove all previous data from workspace
library(xts) # load the required library

# preparation for output to new file
sink(file="solution",append=FALSE,type=c("output"),split=FALSE)

# read in the raw data files
data_1 <- read.csv(file="sp",header=FALSE,sep=",")
itdmtd <- read.csv(file="itdmtd",header=FALSE,sep=",")

# create xts objects from above data:
x <- xts(data_1[c('V2','V3','V4','V5')],as.Date(data_1[,'V1']))
y <- xts(itdmtd[c('V2','V3','V4','V5','V6','V7','V8','V9')],as.Date(itdmtd[,'V1']))
z <- merge.xts(x,y) # merge x and y

# create a contiguous date vector to encompass date range of above data
d <- timeBasedSeq(paste(start(z),end(z),"d",sep="/"), retclass="Date")

# merge z with an "empty" xts object, xts(,d), filling with NA
prices <- merge(z,xts(,d),fill=NA)

# coerce prices xts object to a data frame object
prices_df <- data.frame(date=index(prices), coredata(prices))

# output to new file
write.table(prices_df,quote=FALSE,row.names=FALSE,col.names=FALSE,sep=",")
sink()

The code takes a csv file of the price time series, merges it with a csv file of the "Delta Solution," fills in any missing dates and then writes out the result to a final csv file that looks like this

1995-01-03,828.55,829.45,827.55,828.8,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-04,830.2,831.25,827.6,831.1,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-05,830.5,831.45,829.85,830.6,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-06,830.75,833.05,829,830.35,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-07,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
1995-01-08,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA

This might not actually look like much, but using this as input to this Gnuplot script

reset
set title "Medium Term Delta Turning Points" textcolor rgb "#FFFFFF"
set object 1 rect from graph 0, 0, 0 to graph 1.1, 1.1, 0 behind lw 1.0 fc rgb "#000000" fillstyle solid 1.00
set datafile separator ","
set xdata time
set timefmt "%Y-%m-%d"
set format x
set y2range [0:1]

plot "solution" using 1:10 notitle with impulses linecolor rgb "#B0171F" axis x1y2, \
"solution" using 1:11 notitle with impulses linecolor rgb "#0000FF" axis x1y2, \
"solution" using 1:12 notitle with impulses linecolor rgb "#FFA500" axis x1y2, \
"solution" using 1:13 notitle with impulses linecolor rgb "#00EE00" axis x1y2, \
"solution" using 1:2:3:4:5 notitle with candlesticks linecolor rgb "#FFFFFF" axis x1y1, \
"solution" using 1:($$2>$$5?$$2:1/0):($$2>$$5?$$3:1/0):($$2>$$5?$$4:1/0):($$2>$$5?$$5:1/0) notitle with candlesticks lt 1 axis x1y1, \
"solution" using 1:($$2<$$5?$$2:1/0):($$2<$$5?$$3:1/0):($$2<$$5?$$4:1/0):($$2<$$5?$$5:1/0) notitle with candlesticks lt 3 axis x1y1

gives a nice plot thus

Many readers might say "So what! And...?" but to those readers who know what the Delta Phenomenon is, the coloured lines will be significant. Furthermore, using free, open source software I have created a free, as in gratis, alternative to the software that the Delta Society sells on its website. Most importantly of all, of course, is that I am now in a position to do computerised testing of the Delta Phenomenon. More on these tests in upcoming posts.

Friday, 29 July 2011

Update on the Trend Vigor indicator

I have now completed some preliminary Monte Carlo testing of the Trend Vigor indicator and I thought this would be a good opportunity to share with readers the general approach I have been taking when talking about my Monte Carlo testing on predefined, "ideal" market types.

Firstly, I create my "ideal" market types by using an Octave .oct function, the code for which is given below. In this code can also be seen my implementation of the code for the Trend Vigor indicator, which I think is slightly different from Elher's. The code is commented, so no further description is required here.

.oct function code

// This function takes as arguments a single value input for a sinewave period and a single value input for a degrees increment value. A sinewave of the 
// given period is created and then trends are added to the sinewave such that 5 hypothesised "ideal" market types are created: these are

// 1) a perfectly cyclic sideways market with no trend i.e. just the sinewave component
// 2) an uptrending market with cyclic retracements (uwr) such that retracements are down to the 50% Fibonacci ratio level
// 3) an uptrending market with no retracemnets (unr) i.e. the uptrend completely swamps the downward cyclic component counter to the trend
// 4) a downtrending market with cyclic retracements (dwr) such that retracements are up to the 50% Fibonacci ratio level
// 5) a downtrending market with no retracements (dnr) i.e. the downtrend completely swamps the upward cyclic component counter to the trend

// The cyclic component of these markets is then extracted using the bandpass indicator. These vector lengths are 500 in order to allow time for the 
// bandpass calculations to settle down. The peak to peak amplitude is then recovered from the bandpass at the end of each market vector. The trend slope 
// is also calculated at the end of each market vector. The trend vigor indicator value is then calculated thus
// trend_slope/p_to_p. The bandpass delta is set to 0.2.

// The idea is that 
// for a sideways market the value should be about 0
// for a trending with retracement market the value should be > 0 && < 1 or < 0 && > -1, depending on whether an up trend or down trend
// for a trending with no retracement market the ratio should be > 1  or > -1, depending on whether an up trend or down trend 

// The original sinewave is then repeatedly phase shifted by the degrees increment value, the above process repeated and new values are calculated. 
// All the calculated values for each market type are the vector outputs of the function. 

#include 
#include 
#include 
#define PI 3.14159265

DEFUN_DLD (trend_vigor_dist, args, , "Inputs are period & degrees increment value, outputs are vectors of repeated median slope values")
{
    octave_value_list retval_list;

    if (args(0).length () < 1 | args(0).length () > 1)
    {
        error ("Invalid arguments. Inputs are single value period length and single value degrees increment value");
        return retval_list;
    }

    if (args(1).length () < 1 | args(1).length () > 1)
    {
        error ("Invalid arguments. Inputs are single value period length and single value degrees increment value");
        return retval_list;
    }

    if (error_state)
    {
        error ("Invalid arguments. Inputs are single value period length and single value degrees increment value");
        return retval_list;
    }
    // read inputs
    int period = args(0).int_value ();
    double degrees_inc = args(1).double_value ();
    double period_inc = 360.0 / double(period);
    int length = period + 1; // the length of the "lookback". Is equal to period + 1 to encompass a full period

    // vectors to hold created market values
    ColumnVector sideways_vec(500); // vector to hold sideways market values
    ColumnVector sideways_bandpass_vec(500); // vector to hold bandpass of sideways market values

    ColumnVector uwr_vec(500); // vector to hold uwr market values
    ColumnVector uwr_bandpass_vec(500); // vector to hold bandpass of uwr market values
 
    ColumnVector unr_vec(500); // vector to hold unr market values
    ColumnVector unr_bandpass_vec(500); // vector to hold bandpass of unr market values

    ColumnVector dwr_vec(500); // vector to hold dwr market values
    ColumnVector dwr_bandpass_vec(500); // vector to hold bandpass of dwr market values
 
    ColumnVector dnr_vec(500); // vector to hold dnr market values
    ColumnVector dnr_bandpass_vec(500); // vector to hold bandpass of dnr market values

    // calculate the market trend_incs
    double uwr_trend_inc = 12 / ( 5 * double(period) );
    double unr_trend_inc = 4 / double(period);
    double dwr_trend_inc = -( 12 / ( 5 * double(period) ) );
    double dnr_trend_inc = -( 4 / double(period) );

    // declare variables for bandpass and trend vigor calculations
    double delta = 0.2;
    double beta = cos( (360.0/period)*PI/180.0 );
    double gamma = 1.0 / cos( (720.0*delta/period)*PI/180.0 );
    double alpha = gamma - sqrt(gamma*gamma - 1.0); 
    double power_side;
    double power_uwr;
    double power_unr;
    double power_dwr;
    double power_dnr;
    double rms;
    double p_to_p;

    // create output vectors
    int output_vec_length = int ( 360 / degrees_inc );
    ColumnVector sideways_dist ( output_vec_length ); // create output column of correct length for sideways market
    ColumnVector uwr_dist  ( output_vec_length ); // create output column of correct length for uwr market
    ColumnVector unr_dist  ( output_vec_length ); // create output column of correct length for unr market
    ColumnVector dwr_dist  ( output_vec_length ); // create output column of correct length for dwr market
    ColumnVector dnr_dist  ( output_vec_length ); // create output column of correct length for dnr market

    for (octave_idx_type ii (0); ii < output_vec_length; ii++) 
    {

        // Create the market types and their bandpasses for this ii iteration
 for (octave_idx_type jj (0); jj < 500; jj++) 
        {

        // First create the sideways market type
        sideways_vec(jj) = sin( (degrees_inc*ii + period_inc*jj) * PI / 180 );
                
                if ( jj < 2 )
                {
                sideways_bandpass_vec(jj) = 0;
                }
                else
                {
                sideways_bandpass_vec(jj) = 0.5*(1.0 - alpha)*(sideways_vec(jj) - sideways_vec(jj-2)) + beta*(1.0 + alpha)*sideways_bandpass_vec(jj-1) - alpha*sideways_bandpass_vec(jj-2);
                }

        // next, the uwr retracement market (uwr)
        uwr_vec(jj) = sideways_vec(jj) + jj*uwr_trend_inc;

                if ( jj < 2 )
                {
                uwr_bandpass_vec(jj) = 0;
                }
                else
                {
                uwr_bandpass_vec(jj) = 0.5*(1.0 - alpha)*(uwr_vec(jj) - uwr_vec(jj-2)) + beta*(1.0 + alpha)*uwr_bandpass_vec(jj-1) - alpha*uwr_bandpass_vec(jj-2);
                }

        // next, the unr retracement market (unr)
        unr_vec(jj) = sideways_vec(jj) + jj*unr_trend_inc;

                if ( jj < 2 )
                {
                unr_bandpass_vec(jj) = 0;
                }
                else
                {
                unr_bandpass_vec(jj) = 0.5*(1.0 - alpha)*(unr_vec(jj) - unr_vec(jj-2)) + beta*(1.0 + alpha)*unr_bandpass_vec(jj-1) - alpha*unr_bandpass_vec(jj-2);
                }

        // next, the dwr retracement market (dwr)
        dwr_vec(jj) = sideways_vec(jj) + jj*dwr_trend_inc;

                if ( jj < 2 )
                {
                dwr_bandpass_vec(jj) = 0;
                }
                else
                {
                dwr_bandpass_vec(jj) = 0.5*(1.0 - alpha)*(dwr_vec(jj) - dwr_vec(jj-2)) + beta*(1.0 + alpha)*dwr_bandpass_vec(jj-1) - alpha*dwr_bandpass_vec(jj-2);
                }

        // next, the dnr retracement market (dnr)
        dnr_vec(jj) = sideways_vec(jj) + jj*dnr_trend_inc;

                if ( jj < 2 )
                {
                dnr_bandpass_vec(jj) = 0;
                }
                else
                {
                dnr_bandpass_vec(jj) = 0.5*(1.0 - alpha)*(dnr_vec(jj) - dnr_vec(jj-2)) + beta*(1.0 + alpha)*dnr_bandpass_vec(jj-1) - alpha*dnr_bandpass_vec(jj-2);
                }

        } // end of jj loop to create the different markets and their bandpasses

    // now loop over end of each market vector to create the distributions

     power_side = 0.0;
     power_uwr = 0.0;
     power_unr = 0.0;
     power_dwr = 0.0;
     power_dnr = 0.0;

     for (octave_idx_type jj (0); jj < length; jj++)
         {
         power_side = power_side + sideways_bandpass_vec(499-jj)*sideways_bandpass_vec(499-jj) + sideways_bandpass_vec(499-jj-int(period/4.0))*sideways_bandpass_vec(499-jj-int(period/4.0)) ;
         power_uwr = power_uwr + uwr_bandpass_vec(499-jj)*uwr_bandpass_vec(499-jj) + uwr_bandpass_vec(499-jj-int(period/4.0))*uwr_bandpass_vec(499-jj-int(period/4.0)) ;
         power_unr = power_unr + unr_bandpass_vec(499-jj)*unr_bandpass_vec(499-jj) + unr_bandpass_vec(499-jj-int(period/4.0))*unr_bandpass_vec(499-jj-int(period/4.0)) ;
         power_dwr = power_dwr + dwr_bandpass_vec(499-jj)*dwr_bandpass_vec(499-jj) + dwr_bandpass_vec(499-jj-int(period/4.0))*dwr_bandpass_vec(499-jj-int(period/4.0)) ;
         power_dnr = power_dnr + dnr_bandpass_vec(499-jj)*dnr_bandpass_vec(499-jj) + dnr_bandpass_vec(499-jj-int(period/4.0))*dnr_bandpass_vec(499-jj-int(period/4.0)) ;
         }

     // fill the distribution vectors
     rms = sqrt( power_side / (period+1) ) ;
     p_to_p = 2.0 * 1.414 * rms ;
     sideways_dist(ii) = ( sideways_vec(499) - sideways_vec(499-period) ) / p_to_p ;

     rms = sqrt( power_uwr / (period+1) ) ;
     p_to_p = 2.0 * 1.414 * rms ;
     uwr_dist(ii) = ( uwr_vec(499) - uwr_vec(499-period) ) / p_to_p ;

     rms = sqrt( power_unr / (period+1) ) ;
     p_to_p = 2.0 * 1.414 * rms ;
     unr_dist(ii) = ( unr_vec(499) - unr_vec(499-period) ) / p_to_p ;

     rms = sqrt( power_dwr / (period+1) ) ;
     p_to_p = 2.0 * 1.414 * rms ;
     dwr_dist(ii) = ( dwr_vec(499) - dwr_vec(499-period) ) / p_to_p ;

     rms = sqrt( power_dnr / (period+1) ) ;
     p_to_p = 2.0 * 1.414 * rms ;
     dnr_dist(ii) = ( dnr_vec(499) - dnr_vec(499-period) ) / p_to_p ;

    } // end of main ii loop

    retval_list(4) = dnr_dist;
    retval_list(3) = dwr_dist;
    retval_list(2) = unr_dist;
    retval_list(1) = uwr_dist;
    retval_list(0) = sideways_dist;

    return retval_list;
}

This function is called by this simple Octave script

clear all

inc = input( "Enter phase increment: ");

for ii = 6:50

[sideways_dist,uwr_dist,unr_dist,dwr_dist,dnr_dist] = trend_vigor_dist(ii,inc);
A=[sideways_dist,uwr_dist,unr_dist,dwr_dist,dnr_dist];
file = strcat( int2str(ii),"_period_dist" );
dlmwrite(file,A)

endfor

which writes the output of the tests to named files which are to be used for further analysis in R.

Firstly, using R, I wanted to see what the distribution of the results looks like, so this R script

rm(list=ls())
data <- as.matrix(read.csv(file="20_period_dist",head=FALSE,sep=,))
side <- density(data[,1])
uwr <- density(data[,2])
max_uwr_y <- max(uwr$y)
max_uwr_x <- max(uwr$x)
min_uwr_x <- min(uwr$x)
unr <- density(data[,3])
max_unr_y <- max(unr$y)
max_unr_x <- max(unr$x)
min_unr_x <- min(unr$x)
dwr <- density(data[,4])
max_dwr_y <- max(dwr$y)
max_dwr_x <- max(dwr$x)
min_dwr_x <- min(dwr$x)
dnr <- density(data[,5])
max_dnr_y <- max(dnr$y)
max_dnr_x <- max(dnr$x)
min_dnr_x <- min(dnr$x)
plot_max_y <- max(max_uwr_y,max_unr_y,max_dwr_y,max_dnr_y)
plot_max_x <- max(max_uwr_x,max_unr_x,max_dwr_x,max_dnr_x)
plot_min_x <- min(min_uwr_x,min_unr_x,min_dwr_x,min_dnr_x)
par(mfrow=c(2,1))
plot(uwr,xlim=c(plot_min_x,plot_max_x),ylim=c(0,plot_max_y),col="red")
par(new=TRUE)
plot(unr,xlim=c(plot_min_x,plot_max_x),ylim=c(0,plot_max_y),col="blue")
par(new=TRUE)
plot(dwr,xlim=c(plot_min_x,plot_max_x),ylim=c(0,plot_max_y),col="green")
par(new=TRUE)
plot(dnr,xlim=c(plot_min_x,plot_max_x),ylim=c(0,plot_max_y),col="black")
plot(side)

produces plots such as this output

where the top plot shows the distributions of the uwr, unr, dwr and dnr markets, and the lower plot the sideways market. In this particular case, the spread of each distribution is so narrow ( measured differences of the order of thousandths of a decimal place ) that I consider that for practical purposes the distributions can be treated as single values. This simple R boot strap script gets the average value of the distributions to be used as this single point value.

rm(list=ls())

data <- as.matrix(read.csv(file="trend_vigor_dist_results",head=FALSE,sep=,))
side <- data[,1]
uwr <- data[,2]
unr <- data[,3]
dwr <- data[,4]
dnr <- data[,5]
side_samples <- matrix(0,50000)
uwr_samples <- matrix(0,50000)
unr_samples <- matrix(0,50000)
dwr_samples <- matrix(0,50000)
dnr_samples <- matrix(0,50000)

for(ii in 1:50000) {
side_samples[ii] <- mean(sample(side, replace=T))
uwr_samples[ii] <- mean(sample(uwr, replace=T))
unr_samples[ii] <- mean(sample(unr, replace=T))
dwr_samples[ii] <- mean(sample(dwr, replace=T))
dnr_samples[ii] <- mean(sample(dnr, replace=T))
}

side_mean <- mean(side_samples)
uwr_mean <- mean(uwr_samples)
unr_mean <- mean(unr_samples)
dwr_mean <- mean(dwr_samples)
dnr_mean <- mean(dnr_samples)

For readers' interest, the actual values are 0.829, 1.329, -0.829 and -1.329 with 0 for the sideways market.

This final plot is the same Natural Gas plot as in my previous post, but with the above values substituted for Ehler's default values of 1 and -1.

What I intend to do now is use these values as the means of normal distributions with varying standard deviations as inputs for my Naive Bayes classifier. Further Monte Carlo testing will be done such that values for the standard deviations are obtained that result in the classifier giving false classifications, when tested using the "ideal" markets code above, within acceptable limits, most probably a 5% classification error rate.

Saturday, 14 May 2011

Bayesian Analysis Revisited

After having abandoned my work on Bayesian analysis late last year I am now working on this again, and have been for a few weeks now. I resumed work on this because a response to a question I asked on a forum here led me to the mixdist R package which has now enabled me to model kernel density estimates for a naive Bayes classifier. For more on using a kernel density estimate in a naive Bayes classifier see the link in the answer here.

I expect that it will be a few weeks yet before all the coding and testing of this is complete.

Pages