Friday 21 June 2019

Process Noise Covariance Matrix Q for a Kalman Filter

Since my last post I have been working on the process noise covariance matrix Q, with a view to optimising both the Q and R matrices for an Extended Kalman filter to model the cyclic component of price action as a Sine wave. However, my work to date has produced unsatisfactory results and I have decided to give up trying to make it work.

The reasons for this failure are unclear to me, and I don't intend to spend any more time investigating, but some educated guesses are that the underlying model of sinusoid is mismatched and my estimation of the process and/or measurement noise variances is lacking; either way, the end result is that the EKF is diverging and my earlier leading signal 1, leading signal 2 and leading signal 3 posts outline what I think will be a more promising line of investigation in the future.

Nevertheless, below I provide the rough working code that I have been using in my EKF work, and maybe readers will find something of value in it. There is a lot of commenting and some blocked out code and I'm afraid readers will have to wade through this as best they can.
clear all ;
1 ;

pkg load signal ;

% function declarations
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Y = ekf_sine_h( x )
% Measurement model function for the sine signal.
% Takes the state input vector x of sine, phase, angular frequency and amplitude and 
% calculates the current value of the sine given the state vector values.

f = x( 2 , : ) ;    % phase value in radians
a = x( 4 , : ) ;    % amplitude
Y = a .* sin( f ) ; % the sine value

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ cost ] = sine_ekf_Q_optim( Q_mult , data , targets , Q , R )
  
% initial state from noisy data input measurements
s = data( : , 1 ) ;

n = size( data , 1 ) ;

% initial state covariance 
P = eye( n ) ;

% allocate memory
xV = zeros( size( data ) ) ;  % record estmates of states

% basic Jacobian of state transition matrix 
A = [ 0 0 0 0 ;   % sine value
      0 1 1 0 ;   % phase
      0 0 1 0 ;   % angular frequency
      0 0 0 1 ] ; % amplitude
      
H = eye( n ) ; % measurement matrix

Q = Q_mult .* Q ;

  for k = 2 : size( data , 2 )

  % do ekf
  % nonlinear update x and linearisation at current state s
  x = s ;
  x( 2 ) = x( 2 ) + x( 3 ) ;        % advance phase value by angular frequency
  x( 2 ) = mod( x( 2 ) , 2 * pi ) ; % limit phase values to range 0 --- 2 * pi
  x( 1 ) = x( 4 ) * sin( x( 2 ) ) ; % sine value calculation

  % update the 1st row of the jacobian matrix at state vector s values
  A( 1 , : ) = [ 0 s( 4 ) * cos( s( 2 ) ) 0 sin( s( 2 ) ) ] ;

  P = A * P * A' + Q ; % state transition model update of covariance matrix P

  measurement_residual = [ data( 1 , k ) - x( 1 ) ;   % sine residual
                           data( 2 , k ) - x( 2 ) ;   % phase residual
                           data( 3 , k ) - x( 3 ) ;   % angular frequency residual
                           data( 4 , k ) - x( 4 ) ] ; % amplitude residual
                           
  innovation_residual_covariance = H * P * H' + R ;
  kalman_gain = P * H' / innovation_residual_covariance ;

  % update the state vector s with kalman_gain 
  s = x + kalman_gain * measurement_residual ;

  % some reality based post hoc adjustments
  s( 2 ) = abs( s( 2 ) ) ; % no negative phase values
  s( 3 ) = abs( s( 3 ) ) ; % no negative angular frequencies
  s( 4 ) = abs( s( 4 ) ) ; % no negative amplitudes

  % update the state covariance matrix P
  % NOTE
  % The Joseph formula is given by P+ = ( I − KH ) P− ( I − KH )' + KRK', where I is the identity matrix,
  % K is the gain, H is the measurement mapping matrix, R is the measurement noise covariance matrix, 
  % and P−, P+ are the pre and post measurement update estimation error covariance matrices, respectively.  
  % The optimal linear unbiased estimator (equivalently the optimal linear minimum mean square error estimator)  
  % or Kalman filter often utilizes simplified covariance update equations such as P+ = (I−KH)P− and P+ = P− −K(HP−H'+R)K'.  
  % While these alternative formulations require fewer computations than the Joseph formula, they are only valid 
  % when K is chosen as the optimal Kalman gain. In engineering applications, situations arise where the optimal 
  % Kalman gain is not utilized and the Joseph formula must be employed to update the estimation error covariance.  
  % Two examples of such a scenario are underweighting measurements and considering states. 
  % Even when the optimal gain is used, the Joseph formulation is still preferable because it possesses 
  % greater numerical accuracy than the simplified equations.
  P = ( eye( n ) - kalman_gain * H ) * P * ( eye( n ) - kalman_gain * H )' + kalman_gain * R * kalman_gain' ;
   
  xV( : , k ) = s ; % save estimated updated states
   
  endfor
  
Y = ekf_sine_h( xV ) ;
cost = mean( ( Y( 4 : end - 3 ) .- targets( 4 : end - 3 ) ).^2 ) ;
%cost = mean( ( Y .- targets ).^2 ) ;
%output = xV( 1 , : ) ;

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function output = run_sine_ekf_Q_optim( data , Q , R )
  
% initial state from noisy data input measurements
s = data( : , 1 ) ;

n = size( data , 1 ) ;

% initial state covariance 
P = eye( n ) ;

% allocate memory
xV = zeros( size( data ) ) ;  % record estmates of states

% basic Jacobian of state transition matrix 
A = [ 0 0 0 0 ;   % sine value
      0 1 1 0 ;   % phase
      0 0 1 0 ;   % angular frequency
      0 0 0 1 ] ; % amplitude
      
H = eye( n ) ; % measurement matrix

  for k = 2 : size( data , 2 )

  % do ekf
  % nonlinear update x and linearisation at current state s
  x = s ;
  x( 2 ) = x( 2 ) + x( 3 ) ;        % advance phase value by angular frequency
  x( 2 ) = mod( x( 2 ) , 2 * pi ) ; % limit phase values to range 0 --- 2 * pi
  x( 1 ) = x( 4 ) * sin( x( 2 ) ) ; % sine value calculation

  % update the 1st row of the jacobian matrix at state vector s values
  A( 1 , : ) = [ 0 s( 4 ) * cos( s( 2 ) ) 0 sin( s( 2 ) ) ] ;

  P = A * P * A' + Q ; % state transition model update of covariance matrix P

  measurement_residual = [ data( 1 , k ) - x( 1 ) ;   % sine residual
                           data( 2 , k ) - x( 2 ) ;   % phase residual
                           data( 3 , k ) - x( 3 ) ;   % angular frequency residual
                           data( 4 , k ) - x( 4 ) ] ; % amplitude residual
                           
  innovation_residual_covariance = H * P * H' + R ;
  kalman_gain = P * H' / innovation_residual_covariance ;

  % update the state vector s with kalman_gain 
  s = x + kalman_gain * measurement_residual ;

  % some reality based post hoc adjustments
  s( 2 ) = abs( s( 2 ) ) ; % no negative phase values
  s( 3 ) = abs( s( 3 ) ) ; % no negative angular frequencies
  s( 4 ) = abs( s( 4 ) ) ; % no negative amplitudes

  % update the state covariance matrix P
  % NOTE
  % The Joseph formula is given by P+ = ( I − KH ) P− ( I − KH )' + KRK', where I is the identity matrix,
  % K is the gain, H is the measurement mapping matrix, R is the measurement noise covariance matrix, 
  % and P−, P+ are the pre and post measurement update estimation error covariance matrices, respectively.  
  % The optimal linear unbiased estimator (equivalently the optimal linear minimum mean square error estimator)  
  % or Kalman filter often utilizes simplified covariance update equations such as P+ = (I−KH)P− and P+ = P− −K(HP−H'+R)K'.  
  % While these alternative formulations require fewer computations than the Joseph formula, they are only valid 
  % when K is chosen as the optimal Kalman gain. In engineering applications, situations arise where the optimal 
  % Kalman gain is not utilized and the Joseph formula must be employed to update the estimation error covariance.  
  % Two examples of such a scenario are underweighting measurements and considering states. 
  % Even when the optimal gain is used, the Joseph formulation is still preferable because it possesses 
  % greater numerical accuracy than the simplified equations.
  P = ( eye( n ) - kalman_gain * H ) * P * ( eye( n ) - kalman_gain * H )' + kalman_gain * R * kalman_gain' ;
   
  xV( : , k ) = s ; % save estimated updated states
   
  endfor 

output = xV ;

endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data
cd /home/dekalog/Documents/octave/indices ;

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete date vector
% so now
%    1       2       3       4       5       6       7       8      9      10      11      12      13     14
% aud_cad aud_chf aud_jpy aud_nzd aud_sgd aud_usd cad_chf cad_jpy cad_sgd chf_jpy eur_aud eur_cad eur_chf eur_gbp
%
%   15      16      17      18      19      20      21      22      23      24      25      26      27      28
% eur_jpy eur_nzd eur_sgd eur_usd gbp_aud gbp_cad gbp_chf gbp_jpy gbp_nzd gbp_sgd gbp_usd nzd_cad nzd_chf nzd_jpy
%
%    29      30      31      32      33      34      35      36      37      38     39       40      41      42       
% nzd_sgd nzd_usd sgd_chf sgd_jpy usd_cad usd_chf usd_jpy usd_sgd xag_aud xag_cad xag_chf xag_eur xag_gbp xag_jpy
%
%    43      44     45      46      47      48      49      50      51      52     53       54      55
% xag_nzd xag_sgd xag_usd xau_aud xau_cad xau_chf xau_eur xau_gbp xau_jpy xau_nzd xau_sgd xau_usd xau_xag 

% aud_x = x(1) ; cad_x = x(2) ; chf_x = x(3) ; eur_x = x(4) ; gbp_x = x(5) ; hkd_x = x(6) ;
% jpy_x = x(7) ; nzd_x = x(8) ; sgd_x = x(9) ; usd_x = x(10) ; gold_x = x(11) ; silver_x = x(12) ;
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
% so now index ix are
% aud_x = 56 ; cad_x = 57 ; chf_x = 58 ; eur_x = 59 ; gbp_x = 60 ; jpy_x = 61 ; 
% nzd_x = 62 ; sgd_x = 63 ; usd_x = 64 ; gold_x = 65 ; silver_x = 66 ;

all_raw_data = [ raw_data all_g_c ] ; clear -x all_raw_data ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

cd /home/dekalog/Documents/octave/kalman/ekf ;

##  >> bs_phase_error = mean( all_bootstrap_means(:,1)) + 2 * std( all_bootstrap_means(:,1))
##  bs_phase_error =  0.50319
##  >> bs_period_ang_frequency_error = mean( all_bootstrap_means(:,2)) + 2 * std( all_bootstrap_means(:,2))
##  bs_period_ang_frequency_error =  0.17097
##  >> bs_amp_error = mean( all_bootstrap_means(:,3)) + 2 * std( all_bootstrap_means(:,3))
##  bs_amp_error =  0.18179

price_ix = 65 ;
price = all_raw_data( : , price_ix ) ; % plot(price) ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% get measurements of price action
period = autocorrelation_periodogram( price ) ;
measured_angular_frequency = ( 2 * pi ) ./ period ; % plot( measured_angular_frequency ) ;
trend = sma( price , period ) ;
cycle = price .- trend ;
%smooth_cycle = sgolayfilt( cycle , 2 , 7 ) ;
smooth_cycle = smooth_2_5( cycle ) ;
% plot( price,'k',trend,'r') ;
[ ~ , ~ , ~ , ~ , ~ , ~ , measured_phase ] = sinewave_indicator( cycle ) ; % figure(1) ; plot( deg2rad( measured_phase ) ) ;
% figure(2) ; plot( sin( deg2rad(measured_phase)) ) ;
measured_phase = mod( unwrap( deg2rad( measured_phase ) ) , 2  * pi ) ; % figure(1) ; hold on ; plot( measured_phase , 'r' ) ; hold off ;
% figure(2) ; hold on ; plot( sin( measured_phase)) ; hold off ;

measured_amplitude = cycle ;
for ii = 50 : length( cycle ) ;
measured_amplitude( ii ) = sqrt( 2 ) * sqrt( mean( cycle( ii - period( ii ) : ii ).^2 ) ) ; 
endfor % end ii loop
% plot(measured_amplitude) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% run the ekf optimisation
ekf_cycle = cycle ;

% complete values for R matrix for this data set
data = [ smooth_cycle( 101 : end ) measured_phase( 101 : end ) measured_angular_frequency( 101 : end ) measured_amplitude( 101 : end ) ]' ; 
%measured_sine = sgolayfilt( data( 1 , : ) , 2 , 7 ) ; % figure(1) ; plot( data( 1 , : ) , 'k' , measured_sine , 'r' ) ;
measured_sine = smooth_cycle( 101 : end )' ;
R( 1 , 1 ) = mean( ( data( 1 , : ) .- measured_sine ).^2 ) ; % variance of sine wave value measurements
R( 3 , 3 ) = mean( ( data( 3 , : ) .* 0.17 ).^2 ) ;          % variance of angular frequency measurements
R( 4 , 4 ) = mean( ( data( 4 , : ) .* 0.18 ).^2 ) ;          % variance of amplitude measurements

% get initial process covariance matrix Q
Q = analytical_shrinkage( data' ) ;
Q = 0.1 .* Q ;

lookback = 200 ;

%for ii = 301 : length( price )
  
% complete values for R matrix for this data set
data = [ smooth_cycle(ii-lookback:ii) measured_phase(ii-lookback:ii) measured_angular_frequency(ii-lookback:ii) measured_amplitude(ii-lookback:ii) ]' ; 
%measured_sine = sgolayfilt( data( 1 , : ) , 2 , 7 ) ; % figure(1) ; plot( data( 1 , : ) , 'k' , measured_sine , 'r' ) ;
measured_sine = cycle( ii - lookback : ii )' ;
R( 1 , 1 ) = mean( ( data( 1 , : ) .- measured_sine ).^2 ) ; % variance of sine wave value measurements
R( 3 , 3 ) = mean( ( data( 3 , : ) .* 0.17 ).^2 ) ;          % variance of angular frequency measurements
R( 4 , 4 ) = mean( ( data( 4 , : ) .* 0.18 ).^2 ) ;          % variance of amplitude measurements
  
%  Q = analytical_shrinkage( data' ) ;
   
% optimise the Q matrix for this data set
% declare optimisation function
f = @( Q_mult ) sine_ekf_Q_optim( Q_mult , data , smooth_cycle( ii - lookback : ii )' , Q , R ) ;
% Set options for fminunc
options = optimset( "MaxIter" , 50 ) ;
% initial value
Q_mult = 1 ;
Q_mult = fminunc( f , Q_mult , options ) ;
% adjust Q by optimised Q_mult
Q = Q_mult .* Q ;

  output = run_sine_ekf_Q_optim( data , Q , R ) ;
  %ekf_cycle( ii ) = ekf_sine_h( output( : , end ) ) ;
  ekf_cycle = ekf_sine_h( output ) ;
  
figure(1) ; subplot( 4 , 1 , 1 ) ; plot( data( 1 , : ) , 'k' , 'linewidth' , 2 , ekf_sine_h( output ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 2 ) ; plot( data( 2 , : ) , 'k' , 'linewidth' , 2 , output( 2 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Phase' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 3 ) ; plot( data( 3 , : ) , 'k' , 'linewidth' , 2 , output( 3 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Angular Frequency' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
figure(1) ; subplot( 4 , 1 , 4 ) ; plot( data( 4 , : ) , 'k' , 'linewidth' , 2 , output( 4 , : ) , 'r' , 'linewidth' , 2 ) ;
title( 'High Pass Amplitude' ) ; legend( 'Measured' , 'EKF Estimated' ) ;
  
%endfor

figure(1) ; plot( cycle(101:end)' , 'k' , 'linewidth' , 2 , ekf_cycle , 'r' , 'linewidth' , 2 , output(1,:) , 'b' ) ;
figure(2) ; plot( price(101:end) , 'k' , 'linewidth' , 2 , trend(101:end).+ekf_cycle' , 'r' , 'linewidth' , 2 ) ;
One thing I will point out is the use of a function called analytical_shrinkage, which I have taken directly from a recent paper, Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices, the MATLAB code being provided as an appendix in the paper. Readers may find this to be particularly useful outside the use I have tried putting it to.

Next week I shall go on my customary, summer working holiday and be away from home until August: therefore there will be a hiatus in blog posts until my return.

Thursday 6 June 2019

Determining the Noise Covariance Matrix R for a Kalman Filter

An important part of getting a Kalman filter to work well is tuning the process noise covariance matrix Q and the measurement noise covariance matrix R. This post is about obtaining the R matrix, with a post about the Q matrix to come in due course.

In my last post about the alternative version extended kalman filter to model a sine wave I explained that the 4 measurements are the sine wave value itself, and the phase, angular frequency and amplitude of the sine wave. The phase and angular frequencies are "measured" using outputs of the sinewave indicator and the amplitude is calculated as an RMS value over a measured period. The "problem" with this is that the accuracies of such measurements on real data are unknown, and therefore so is the R matrix, because the underlying cycle properties are unknown. My solution to this is offline Monte Carlo simulation.

As a first step the Octave code in the box immediately below
clear all ;
pkg load statistics ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete data vector 

all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index

all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector

all_raw_data = [ raw_data all_g_c ] ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% set up recording vectors
amplitude = zeros( size( all_raw_data , 1 ) , 1 ) ;
all_amplitudes = zeros( size( all_raw_data ) ) ;
all_periods = all_amplitudes ;

for ii = 1 : size( all_raw_data , 2 )
  
  price = all_raw_data( : , ii ) ;
  period = autocorrelation_periodogram( price ) ;
  all_periods( : , ii ) = period ;
  [ hp , ~ ] = highpass_filter_basic( price ) ; hp( 1 : 99 ) = 0 ;

  for jj = 100 : size( all_raw_data , 1 )
    amplitude( jj ) = sqrt( 2 ) * sqrt( mean( hp( round( jj - period( jj ) / 2 ) : jj , 1 ).^2 ) ) ;
  endfor
  
  % store information about amplitude changes as a multiplicative constant
  all_amplitudes( : , ii ) = amplitude ./ shift( amplitude , 1 ) ;

endfor

% truncate
all_amplitudes( 1 : 101 , : ) = [] ;
all_periods( 1 : 101 , : ) = [] ;

max_period = max( max( all_periods ) ) ;
min_period = min( min( all_periods ) ) ;

% unroll
all_amplitudes = all_amplitudes( : ) ; all_periods = all_periods( : ) ;

amplitude_changes = cell( max_period , 1 ) ;

for ii = min_period : max_period
  ix = find( all_periods == ii ) ;
  amplitude_changes{ ii , 1 } = all_amplitudes( ix , 1 ) ;
endfor

save all_amplitude_changes amplitude_changes ;
is used to get information about cycle amplitudes from real data and store it in a cell array.

This next box shows code that uses this cell array, plus earlier hidden markov modelling code for sine wave modelling, to produce synthetic price data
clear all ;
pkg load signal ;
pkg load statistics ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load data

raw_data = dlmread( 'raw_data_for_indices_and_strengths' ) ;
delete_hkd_crosses = [ 3 9 12 18 26 31 34 39 43 51 61 ] .+ 1 ; % +1 to account for date column
raw_data( : , delete_hkd_crosses ) = [] ;
raw_data( : , 1 ) = [] ; % delete date vector
all_g_c = dlmread( "all_g_mults_c" ) ; % the currency g mults
all_g_c( : , 7 ) = [] ; % delete hkd index
all_g_s = dlmread( "all_g_sv" ) ;      % the gold and silver g mults   
all_g_c = [ all_g_c all_g_s(:,2:3) ] ; % a combination of the above 2
all_g_c( : , 2 : end ) = cumprod( all_g_c( : , 2 : end) , 1 ) ;
all_g_c( : , 1 ) = [] ; % delete date vector
all_raw_data = [ raw_data all_g_c ] ; clear -x all_raw_data ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

load all_hmm_periods_daily ;
load all_amplitude_changes ;

% choose an ix for all_raw_data
ix = input( "ix? " ) ;
price = all_raw_data( : , ix ) ;
period = autocorrelation_periodogram( price ) ;
[ hp , trend ] = highpass_filter_basic( price ) ;

% estimate variance of cycle noise
hp_filt = sgolayfilt( hp , 2 , 7 ) ;
noise = hp( 101 : end ) .- hp_filt( 101 : end ) ; noise_var = var( noise ) ;
% zero pad beginning of noise vector
noise = [ zeros( 100 , 1 ) ; noise ] ;

% set for plotting purposes
hp( 1 : 50 ) = 0 ; trend( 1 : 50 ) = price( 1 : 50 ) ;

% create sine wave with known phase and period/angular frequency 
[ gen_period , ~ ] = hmmgenerate( length( price ) , transprobest , outprobest ) ;
gen_period = gen_period .+ hmm_min_period_add ;
phase = cumsum( ( 2 * pi ) ./ gen_period ) ; phase = mod( phase , 2 * pi ) ;
gen_sine = sin( phase ) ;

% amplitude vector
amp = ones( size( hp ) ) ;

% RMS initial amplitude 
amp( 100 ) = sqrt( 2 ) * sqrt( mean( hp( 100 - period( 100 ) : 100 ).^2 ) ) ;

% alter sine wave with known amplitude
for ii = 101 : length( hp )
  % get multiple for this period
  random_amp_mults = amplitude_changes{ gen_period( ii ) } ;
  rand_mult = random_amp_mults( randi( size( random_amp_mults , 1 ) , 1 ) ) ;
  amp( ii ) = rand_mult * sqrt( 2 ) * sqrt( mean( hp( ii - 1 - period( ii - 1 ) : ii - 1 ).^2 ) ) ; ;
  gen_sine( ii ) = amp( ii ) * gen_sine( ii ) ; 
endfor

% estimate variance of generated cycle noise
gen_filt = sgolayfilt( gen_sine , 2 , 7 ) ;
gen_noise = gen_sine( 101 : end ) .- gen_filt( 101 : end ) ; gen_noise_var = var( gen_noise ) ;

% a crude adjustment of generated cycle noise to match "real" noise by adding
% noise
gen_sine = gen_sine' .+ noise ;

% set beginning of gen_sine to real cycle for plotting purposes
gen_sine( 1 : 100 ) = hp( 1 : 100 ) ;
new_price = trend .+ gen_sine ;

% a simple correlation check
correlation_cycle_gen_sine = corr( hp , gen_sine ) ;
correlation_price_gen_price = corr( price , new_price ) ;

figure(1) ; plot( gen_sine , 'k' , 'linewidth' , 2 ) ; grid minor on ; title( 'The Generated Cycle' ) ; 
figure(2) ; plot( price , 'b' , 'linewidth' , 2 , trend , 'k' , 'linewidth' , 1 , new_price , 'r' , 'linewidth', 2 ) ; 
title( 'Comparison of Real and Generated Prices' ) ; legend( 'Real Price' , 'Real Trend' , 'Generated Price' ) ; grid minor on ;
figure(3) ; plot( hp , 'k' , 'linewidth' , 2 , gen_sine , 'r' , 'linewidth' , 2 ) ; title( 'Generated Cycle compared with Real Cycle' ) ; 
legend( 'Real Cycle' , 'Generated Cycle' ) ; grid minor on ;
Some typical chart output of this code is the synthetic price series in red compared with the real blue price series
and the underlying cycles.
For this particular run the correlation between the synthetic price series and the real price series is 0.99331, whilst the correlation between the relevant cycles is only 0.19988, over 1500 data points ( not all are shown in the above charts. )

Because the synthetic cycle is definitely a sine wave ( by construction ) with a known phase, angular frequency and amplitude I can run the above sinewave_indicator and RMS measurements on the synthetic prices, compare with the known synthetic cycle, and estimate the measurement noise covariance matrix R.

More in due course.