Thursday, 21 September 2017

Downloading Historical Data Using Oanda's API and R

It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data.

Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position.

My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio.

The first such R script is called to initialise the variables and load the various functions into the R environment, shown below.
# Required Packages in order to use the R API functions
library("downloader")
library("RCurl")
library("jsonlite")
library("httr")

# -- ---------------------------------------------------------------------------------- #
# -- Specific Values of Parameters in API for my primary account ---------------------- #
# -- ---------------------------------------------------------------------------------- #

# -- numeric ------ # My Account ID 
AccountID = 123456     
# -- character ---- # My Oanda Token
AccountToken = "blah-de-blah-de-blah"

# load the various function files
source("~/path/to/InstrumentsList.R")
source("~/path/to/R_Oanda_API_functions/ActualPrice.R")
source("~/path/to/HisPricesCount.R")
source("~/path/to/HisPricesDates.R")

# get list of all tradeable instruments
trade_instruments = InstrumentsList( AccountToken , AccountID )
View( trade_instruments )

# load some default values
# -- character ---- # Granularity of the prices
# granularity: The time range represented by each candlestick. The value specified will determine the alignment 
# of the first candlestick.
# choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month )
Granularity = "D"

# -- numeric ------ # Hour of the "End of the Day"
# dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. 
# The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be 
# an integer between 0 and 23.
DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York"

# -- character ---- # Time Zone in format "Continent/Zone
# alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the 
# returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by 
# the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at 
# http://developer.oanda.com/docs/timezones.txt
# "America%2FMexico_City" was the originallly provided, but could use, for example,  "Europe%2FLondon" or "Europe%2FWarsaw"
TimeAlign = "Europe%2FLondon"

################################# IMPORTANT NOTE #####################################################################
# By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures
# that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is 
# Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer
# daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT.
######################################################################################################################

Start = Sys.time() # Current system time
End = Sys.time() # Current system time
Count = 500 # Oanda default

# now cd to the working directory
setwd("~/path/to/oanda_data")
The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next.
InstrumentsList = function( AccountToken , AccountID )
{
  httpaccount = "https://api-fxtrade.oanda.com"
  auth        = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  Queryhttp   = paste(httpaccount,"/v1/instruments?accountId=",sep="")
  QueryInst   = paste(Queryhttp,AccountID,sep="")
  QueryInst1  = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstJson    = fromJSON(QueryInst1, simplifyDataFrame = TRUE)
  FinalData   = data.frame(InstJson)
  colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits")
  FinalData$MaxTradeUnits = as.numeric(FinalData$MaxTradeUnits)
  return(FinalData)
}
This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also.
# cd to the working directory
setwd("~/path/to/oanda_data")

# dataframe to keep track of updates
Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE )

for( ii in 1 : nrow( trade_instruments ) ) {
  
  instrument = trade_instruments[ ii , 1 ]
  
  # write details of instrument to Instrument_update_file
  Instrument_update_file[ ii , 1 ] = instrument
  
  historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 )
  last_row_ix = nrow( historical_prices )
  
  if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete
    # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data
    # to Instrument_update_file
    Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] )
    
    # and delete the row with these incomplete values
    historical_prices = historical_prices[ 1 : last_row_ix - 1 , ]
    
  } # end of if statement
  
  # Write historical_prices to file
  write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , 
             col.names = FALSE , sep = "," )

  } # end of for loop

  # Write Instrument_update_file to file
  write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," )
This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is
HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){
  
  httpaccount      = "https://api-fxtrade.oanda.com"
  auth             = c(Authorization = paste("Bearer",AccountToken,sep=" "))
  QueryHistPrec    = paste(httpaccount,"/v1/candles?instrument=",sep="")
  QueryHistPrec1   = paste(QueryHistPrec,Instrument,sep="")
  qcount           = paste("count=",Count,sep="")
  qcandleFormat    = "candleFormat=midpoint"
  qgranularity     = paste("granularity=",Granularity,sep="")
  qdailyalignment  = paste("dailyAlignment=",DayAlign,sep="")
  QueryHistPrec2   = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&")
  InstHistP        = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth)
  InstHistPjson    = fromJSON(InstHistP, simplifyDataFrame = TRUE)
  Prices           = data.frame(InstHistPjson[[3]])
  Prices$time      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
  colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
  Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
  attributes(Prices$TimeStamp)$tzone = TimeAlign
  return(Prices)
  
}
One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.