Following on from my post yesterday, this post presents some preliminary results from the test I was running while writing yesterday's post. However, before I get to these results I would like to talk a bit about the hypothesis being tested.
I had an inkling that the dominant cycle period might having some bearing on tau, the time delay for the time series embedding implied by Taken's theorem, so I set up the test (described below) and after writing the post yesterday, and while the test was still running, I did some online research to see if there is any theoretical justification available.
One of the papers I found online is A Novel Method for Topological Embedding of Time-Series Data. From the abstract
"...we propose a novel method for embedding one-dimensional, periodic time-series data into higher-dimensional topological spaces to support robust recovery of signal features via topological data analysis under noisy sampling conditions...To provide evidence for the viability of this method, we analyze the simple case of sinusoidal data..."
One of the conclusions drawn is that for sinusoidal data the ideal embedding length is a quarter of the cycle period, i.e. Π/2.
Another paper I found was Topological time-series analysis with delay-variant embedding. This paper investigates "delay-variant embedding," essentially making the embedding length tau variable. From the concluding remarks
"We have demonstrated that the topological features that are constructed using delay-variant embedding can capture the topological variation in a time series when the time-delay value changes... These results indicate that the topological features that are deduced with delay-variant embedding can be used to reveal the representative features of the original time series."
My inkling was that tau should be adaptive to the measured dominant cycle, and both the above linked papers do provide some justification. Anyway, on to the test.
Reusing the code from my earlier post here I created synthetic price series with known but variable dominant cycle periods and based on real price trends, which results in synthetic series that are highly correlated with real price series but with underlying characteristics being exactly known. I then used my version of mdembedding code to calculate tau for numerous Monte Carlo replications of time series of prices based on a range of real forex pairs prices, gold and silver prices and different indices created by my currency strength indicator methodology. I then created a ratio of tau/average_measured_period measured as a global value across the whole of each individual time series replication as the test statistic, with the period being measured by an autocorrelation periodogram function: a ratio value of 0.5, for example, meaning that the tau for that time series is half the average period over the whole series.
Below is a plot of the histogram of these ratios:
which shows the ratios being centred approximately around 0.4 but with a long right tail. The following histogram is of the means of the bootstrap with replacement of the above distribution,
which is centred around a mean value of 0.436. Using this bootstrapped mean ratio of tau/period implies, for example, that the ideal tau for an average period of 20 is 8.72, which when rounded up to 9, is almost twice the theoretical ideal of a 5 day tau for 20 day cyclic period prices.
Personally I think this is an encouraging set of preliminary test results, with some caveats. More in due course.
"Trading is statistics and time series analysis." This blog details my progress in developing a systematic trading system for use on the futures and forex markets, with discussion of the various indicators and other inputs used in the creation of the system. Also discussed are some of the issues/problems encountered during this development process. Within the blog posts there are links to other web pages that are/have been useful to me.
Showing posts with label Monte Carlo.. Show all posts
Showing posts with label Monte Carlo.. Show all posts
Thursday, 5 September 2019
Wednesday, 4 September 2019
Taken's Theorem and Time Series Embedding
I am now back from my summer break and am currently looking at using Taken's theorem and am using an adapted version of this mdembedding code ( adapted to run smoothly in Octave ). At the moment of writing this post I have a Monte Carlo test running in the background on my computer, the results of which shall be the subject of my next blog post within the next few days.
Thursday, 6 June 2019
Determining the Noise Covariance Matrix R for a Kalman Filter
An important part of getting a Kalman filter to work well is tuning the process noise covariance matrix Q and the measurement noise covariance matrix R. This post is about obtaining the R matrix, with a post about the Q matrix to come in due course.
In my last post about the alternative version extended kalman filter to model a sine wave I explained that the 4 measurements are the sine wave value itself, and the phase, angular frequency and amplitude of the sine wave. The phase and angular frequencies are "measured" using outputs of the sinewave indicator and the amplitude is calculated as an RMS value over a measured period. The "problem" with this is that the accuracies of such measurements on real data are unknown, and therefore so is the R matrix, because the underlying cycle properties are unknown. My solution to this is offline Monte Carlo simulation.
As a first step the Octave code in the box immediately below
This next box shows code that uses this cell array, plus earlier hidden markov modelling code for sine wave modelling, to produce synthetic price data
and the underlying cycles. For this particular run the correlation between the synthetic price series and the real price series is 0.99331, whilst the correlation between the relevant cycles is only 0.19988, over 1500 data points ( not all are shown in the above charts. )
Because the synthetic cycle is definitely a sine wave ( by construction ) with a known phase, angular frequency and amplitude I can run the above sinewave_indicator and RMS measurements on the synthetic prices, compare with the known synthetic cycle, and estimate the measurement noise covariance matrix R.
More in due course.
In my last post about the alternative version extended kalman filter to model a sine wave I explained that the 4 measurements are the sine wave value itself, and the phase, angular frequency and amplitude of the sine wave. The phase and angular frequencies are "measured" using outputs of the sinewave indicator and the amplitude is calculated as an RMS value over a measured period. The "problem" with this is that the accuracies of such measurements on real data are unknown, and therefore so is the R matrix, because the underlying cycle properties are unknown. My solution to this is offline Monte Carlo simulation.
As a first step the Octave code in the box immediately below
clear all ;
pkg load statistics ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data
raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete data vector
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ; % the gold and silver g mults
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
all_raw_data = [ raw_data all_g_c ] ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% set up recording vectors
amplitude = zeros( size( all_raw_data , 1 ) , 1 ) ;
all_amplitudes = zeros( size( all_raw_data ) ) ;
all_periods = all_amplitudes ;
for ii = 1 : size( all_raw_data , 2 )
price = all_raw_data( : , ii ) ;
period = autocorrelation_periodogram( price ) ;
all_periods( : , ii ) = period ;
[ hp , ~ ] = highpass_filter_basic( price ) ; hp( 1 : 99 ) = 0 ;
for jj = 100 : size( all_raw_data , 1 )
amplitude( jj ) = sqrt( 2 ) * sqrt( mean( hp( round( jj - period( jj ) / 2 ) : jj , 1 ).^2 ) ) ;
endfor
% store information about amplitude changes as a multiplicative constant
all_amplitudes( : , ii ) = amplitude ./ shift( amplitude , 1 ) ;
endfor
% truncate
all_amplitudes( 1 : 101 , : ) = [] ;
all_periods( 1 : 101 , : ) = [] ;
max_period = max( max( all_periods ) ) ;
min_period = min( min( all_periods ) ) ;
% unroll
all_amplitudes = all_amplitudes( : ) ; all_periods = all_periods( : ) ;
amplitude_changes = cell( max_period , 1 ) ;
for ii = min_period : max_period
ix = find( all_periods == ii ) ;
amplitude_changes{ ii , 1 } = all_amplitudes( ix , 1 ) ;
endfor
save all_amplitude_changes amplitude_changes ;
is used to get information about cycle amplitudes from real data and store it in a cell array.This next box shows code that uses this cell array, plus earlier hidden markov modelling code for sine wave modelling, to produce synthetic price data
clear all ;
pkg load signal ;
pkg load statistics ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data
raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete date vector
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ; % the gold and silver g mults
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
all_raw_data = [ raw_data all_g_c ] ; clear -x all_raw_data ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load all_hmm_periods_daily ;
load all_amplitude_changes ;
% choose an ix for all_raw_data
ix = input( "ix? " ) ;
price = all_raw_data( : , ix ) ;
period = autocorrelation_periodogram( price ) ;
[ hp , trend ] = highpass_filter_basic( price ) ;
% estimate variance of cycle noise
hp_filt = sgolayfilt( hp , 2 , 7 ) ;
noise = hp( 101 : end ) .- hp_filt( 101 : end ) ; noise_var = var( noise ) ;
% zero pad beginning of noise vector
noise = [ zeros( 100 , 1 ) ; noise ] ;
% set for plotting purposes
hp( 1 : 50 ) = 0 ; trend( 1 : 50 ) = price( 1 : 50 ) ;
% create sine wave with known phase and period/angular frequency
[ gen_period , ~ ] = hmmgenerate( length( price ) , transprobest , outprobest ) ;
gen_period = gen_period .+ hmm_min_period_add ;
phase = cumsum( ( 2 * pi ) ./ gen_period ) ; phase = mod( phase , 2 * pi ) ;
gen_sine = sin( phase ) ;
% amplitude vector
amp = ones( size( hp ) ) ;
% RMS initial amplitude
amp( 100 ) = sqrt( 2 ) * sqrt( mean( hp( 100 - period( 100 ) : 100 ).^2 ) ) ;
% alter sine wave with known amplitude
for ii = 101 : length( hp )
% get multiple for this period
random_amp_mults = amplitude_changes{ gen_period( ii ) } ;
rand_mult = random_amp_mults( randi( size( random_amp_mults , 1 ) , 1 ) ) ;
amp( ii ) = rand_mult * sqrt( 2 ) * sqrt( mean( hp( ii - 1 - period( ii - 1 ) : ii - 1 ).^2 ) ) ; ;
gen_sine( ii ) = amp( ii ) * gen_sine( ii ) ;
endfor
% estimate variance of generated cycle noise
gen_filt = sgolayfilt( gen_sine , 2 , 7 ) ;
gen_noise = gen_sine( 101 : end ) .- gen_filt( 101 : end ) ; gen_noise_var = var( gen_noise ) ;
% a crude adjustment of generated cycle noise to match "real" noise by adding
% noise
gen_sine = gen_sine' .+ noise ;
% set beginning of gen_sine to real cycle for plotting purposes
gen_sine( 1 : 100 ) = hp( 1 : 100 ) ;
new_price = trend .+ gen_sine ;
% a simple correlation check
correlation_cycle_gen_sine = corr( hp , gen_sine ) ;
correlation_price_gen_price = corr( price , new_price ) ;
figure(1) ; plot( gen_sine , 'k' , 'linewidth' , 2 ) ; grid minor on ; title( 'The Generated Cycle' ) ;
figure(2) ; plot( price , 'b' , 'linewidth' , 2 , trend , 'k' , 'linewidth' , 1 , new_price , 'r' , 'linewidth', 2 ) ;
title( 'Comparison of Real and Generated Prices' ) ; legend( 'Real Price' , 'Real Trend' , 'Generated Price' ) ; grid minor on ;
figure(3) ; plot( hp , 'k' , 'linewidth' , 2 , gen_sine , 'r' , 'linewidth' , 2 ) ; title( 'Generated Cycle compared with Real Cycle' ) ;
legend( 'Real Cycle' , 'Generated Cycle' ) ; grid minor on ;
Some typical chart output of this code is the synthetic price series in red compared with the real blue price seriesand the underlying cycles. For this particular run the correlation between the synthetic price series and the real price series is 0.99331, whilst the correlation between the relevant cycles is only 0.19988, over 1500 data points ( not all are shown in the above charts. )
Because the synthetic cycle is definitely a sine wave ( by construction ) with a known phase, angular frequency and amplitude I can run the above sinewave_indicator and RMS measurements on the synthetic prices, compare with the known synthetic cycle, and estimate the measurement noise covariance matrix R.
More in due course.
Subscribe to:
Posts (Atom)