Thursday 5 September 2019

Preliminary Test Results of Time Series Embedding

Following on from my post yesterday, this post presents some preliminary results from the test I was running while writing yesterday's post. However, before I get to these results I would like to talk a bit about the hypothesis being tested.

I had an inkling that the dominant cycle period might having some bearing on tau, the time delay for the time series embedding implied by Taken's theorem, so I set up the test (described below) and after writing the post yesterday, and while the test was still running, I did some online research to see if there is any theoretical justification available.

One of the papers I found online is A Novel Method for Topological Embedding of Time-Series Data. From the abstract

"...we propose a novel method for embedding one-dimensional, periodic time-series data into higher-dimensional topological spaces to support robust recovery of signal features via topological data analysis under noisy sampling conditions...To provide evidence for the viability of this method, we analyze the simple case of sinusoidal data..."

One of the conclusions drawn is that for sinusoidal data the ideal embedding length is a quarter of the cycle period, i.e. Π/2.

Another paper I found was Topological time-series analysis with delay-variant embedding. This paper investigates "delay-variant embedding," essentially making the embedding length tau variable. From the concluding remarks

"We have demonstrated that the topological features that are constructed using delay-variant embedding can capture the topological variation in a time series when the time-delay value changes... These results indicate that the topological features that are deduced with delay-variant embedding can be used to reveal the representative features of the original time series."

My inkling was that tau should be adaptive to the measured dominant cycle, and both the above linked papers do provide some justification. Anyway, on to the test.

Reusing the code from my earlier post here I created synthetic price series with known but variable dominant cycle periods and based on real price trends, which results in synthetic series that are highly correlated with real price series but with underlying characteristics being exactly known. I then used my version of mdembedding code to calculate tau for numerous Monte Carlo replications of time series of prices based on a range of real forex pairs prices, gold and silver prices and different indices created by my currency strength indicator methodology. I then created a ratio of tau/average_measured_period measured as a global value across the whole of each individual time series replication as the test statistic, with the period being measured by an autocorrelation periodogram function: a ratio value of 0.5, for example, meaning that the tau for that time series is half the average period over the whole series.

Below is a plot of the histogram of these ratios:
which shows the ratios being centred approximately around 0.4 but with a long right tail. The following histogram is of the means of the bootstrap with replacement of the above distribution,
which is centred around a mean value of 0.436. Using this bootstrapped mean ratio of tau/period implies, for example, that the ideal tau for an average period of 20 is 8.72, which when rounded up to 9, is almost twice the theoretical ideal of a 5 day tau for 20 day cyclic period prices.

Personally I think this is an encouraging set of preliminary test results, with some caveats. More in due course.

No comments: