Dekalog Blog: Monte Carlo

Showing posts with label Monte Carlo. Show all posts

Thursday, 18 June 2020

More Work on RVFL Networks

Back in November last year I posted about Random Vector Functional Link (RVFL) networks here and here. Since then, along with my recent work on Oanda's API Octave functions and Market/Volume Profile visualisation, I have continued looking at RVFL networks and this post is an update on this work.

The "random" in RVFL means random initialisation of weights that are then fixed. It seems to me that it might be possible to do better than random by having some principled way of weight initialisation. To this end I have used the Penalised MATLAB Toolbox on features derived from my ideal cyclic tau embedding function to at first train a Generalized Linear Model with the Lasso penalty and then the Ridge penalty over thousands of sets of Monte Carlo generated, ideal cyclic prices and such prices with trends. The best weights for each set of prices were recorded in an array and then the mean weight (and standard deviation) taken. This set of mean weights is intended to replace the random weights in a RVFL network designed to predict the probability of "price" being at a cyclic turning point using the above cyclic tau embedding features.

Of course these weights could be considered a trained model in and of themselves, and the following screenshots show "out of sample" performance on Monte Carlo generated ideal prices that were not used in the training of the mean weights.

The black line is the underlying cyclic price and the red, blue and green lines are the mean weight model probabilities for cyclic peaks, troughs or neither respectively. Points where the peak/trough probabilities exceed the neither probabilities are marked by the red and blue vertical lines. Similarly, we have prices trending up in a cyclic fashion

and also trending down

In the cases of the last two trending markets only the swing highs and lows are indicated. The reason for this is that during training, based on my "expert knowledge" of the cyclic tau features used, it is unreasonable to expect these features to accurately capture the end of an up leg in a bull trend or the end of a down leg in a bear trend - hence these were not presented as a positive class during training.

As I said above the motivation for this is to get a more meaningful hidden layer in a RVFL network. This hidden layer will consist of seven Sigmoid functions which each give a probability of price being at or not being at a cyclic turn, conditional upon the type of market the input weights were trained on.

Friday, 29 November 2019

RVFL Network Results

Having let the tests outlined in my previous post run, I can say that the results are inconclusive as the optimised number of neurons in the hidden layer turns out to be one, the minimum possible to even have a hidden layer. This leads me to believe that the raw features, the ideal cyclic tau embedding, are sufficient in and of themselves for the purpose I have in mind.

To that end, I have indulged in a bit of feature engineering and written the following Octave function,

## Copyright (C) 2019 dekalog
## 
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see
## www.gnu.org.

## -*- texinfo -*- 
## @deftypefn {} {@var{emb1}, @var{emb2} =} cyclic_embedding (@var{price}, @var{period})
##
## Inputs are a price vector and a period vector.
##
## The function normalises the price between -1 and +1 over an adaptive period lookback.
##
## The outputs are two matrices of 3 columns each - the first column is normalised price
## and the second and third are delay embeddings of Tau and 2 x Tau, Tau being equal to
## one quarter of the adaptive period vector, the theoretical ideal Tau for sinusoidal
## waveforms.
##
## The first output matrix, EMB1, normalises the Tau and 2 x Tau columns according to the
## most recent max high/min low; EMB2 Tau and 2 x Tau are normalised according to the 
## max high/min low in force at the delay embedding time.
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2019-09-18

function [ emb1 , emb2 , prob_matrix ] = cyclic_embedding( price , period )
  
price_smooth = price ;
emb1 = repmat( price , 1 , 3 ) ;
emb2 = emb1 ;
coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 0 ) ; coeffs = coeffs' ;

## initialising loop
price_smooth( 1 : 3 ) = coeffs( 1 : 3 , : ) * price( 1 : 5 ) ;  
for ii = 4 : 48 
  price_smooth( ii ) = coeffs( 3 , : ) * price( ii - 2 : ii + 2 ) ;
endfor
price_smooth( 49 : 50 ) = coeffs( 4 : 5 , : ) * price( 46 : 50 ) ;
## end initialising loop 

coeffs( 1 : 2 , : ) = [] ;

for ii = 51 : size( price , 1 )

  price_smooth( ii - 2 : ii ) = coeffs * price( ii - 4 : ii ) ;
  max_r = max( price_smooth( ii - period( ii ) : ii ) ) ;
  min_r = min( price_smooth( ii - period( ii ) : ii ) ) ;
  
  ## period is exactly divisable by 4 ( and 2 ), e.g. 8 12 16 20 24 28 32 36 40 44 48 etc?
  if ( rem( period( ii ) , 4 ) == 0 ) 
    
  emb1( ii , 1 ) = 2 * ( ( price_smooth( ii ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 2 ) = 2 * ( ( price_smooth( ii - round( period( ii ) / 4 ) ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 3 ) = 2 * ( ( price_smooth( ii - round( period( ii ) / 2 ) ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb2( ii , 1 ) = emb1( ii , 1 ) ;
  emb2( ii , 2 ) = emb2( ii - round( period( ii ) / 4 ) , 1 ) ;
  emb2( ii , 3 ) = emb2( ii - round( period( ii ) / 2 ) , 1 ) ;
  
  ## periods 10 14 18 22 26 30 34 38 42 46 50
  elseif ( rem( period( ii ) , 2 ) == 0 && rem( period( ii ) , 4 ) == 2 )

  emb1( ii , 1 ) = 2 * ( ( price_smooth( ii ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 2 ) = 2 * ( ( ( 0.5*price_smooth( ii - round( period( ii ) / 4 ) ) + 0.5*price_smooth( ii - round( period( ii ) / 4 ) + 1 ) ) - min_r )...
                       / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 3 ) = 2 * ( ( price_smooth( ii - round( period( ii ) / 2 ) ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb2( ii , 1 ) = emb1( ii , 1 ) ;
  emb2( ii , 2 ) = 0.5*emb2( ii - round( period( ii ) / 4 ) , 1 ) + 0.5*emb2( ii - round( period( ii ) / 4 ) + 1 , 1 ) ;
  emb2( ii , 3 ) = emb2( ii - round( period( ii ) / 2 ) , 1 ) ;

  ## periods 9 13 17 21 25 29 33 37 41 45 49
  elseif ( rem( period( ii ) , 2 ) == 1 && rem( period( ii ) , 4 ) == 1 )

  emb1( ii , 1 ) = 2 * ( ( price_smooth( ii ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 2 ) = 2 * ( ( ( 0.75*price_smooth( ii - round( period( ii ) / 4 ) ) + 0.25*price_smooth( ii - round( period( ii ) / 4 ) - 1 ) ) - min_r )...
                       / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 3 ) = 2 * ( ( ( 0.5*price_smooth( ii - round( period( ii ) / 2 ) ) + 0.5*price_smooth( ii - round( period( ii ) / 2 ) + 1 ) ) - min_r )...
                       / ( max_r - min_r ) - 0.5 ) ;
  emb2( ii , 1 ) = emb1( ii , 1 ) ;
  emb2( ii , 2 ) = 0.75*emb2( ii - round( period( ii ) / 4 ) , 1 ) + 0.25*emb2( ii - round( period( ii ) / 4 ) - 1 , 1 ) ;
  emb2( ii , 3 ) = 0.5*emb2( ii - round( period( ii ) / 2 ) , 1 ) + 0.5*emb2( ii - round( period( ii ) / 2 ) + 1 , 1 ) ;
  
  ## periods 11 15 19 23 27 31 35 39 43 47
  elseif ( rem( period( ii ) , 2 ) == 1 && rem( period( ii ) , 4 ) == 3 )

  emb1( ii , 1 ) = 2 * ( ( price_smooth( ii ) - min_r ) / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 2 ) = 2 * ( ( ( 0.75*price_smooth( ii - round( period( ii ) / 4 ) ) + 0.25*price_smooth( ii - round( period( ii ) / 4 ) + 1 ) ) - min_r )...
                       / ( max_r - min_r ) - 0.5 ) ;
  emb1( ii , 3 ) = 2 * ( ( ( 0.5*price_smooth( ii - round( period( ii ) / 2 ) ) + 0.5*price_smooth( ii - round( period( ii ) / 2 ) + 1 ) ) - min_r )...
                       / ( max_r - min_r ) - 0.5 ) ;
  emb2( ii , 1 ) = emb1( ii , 1 ) ;
  emb2( ii , 2 ) = 0.75*emb2( ii - round( period( ii ) / 4 ) , 1 ) + 0.25*emb2( ii - round( period( ii ) / 4 ) + 1 , 1 ) ;
  emb2( ii , 3 ) = 0.5*emb2( ii - round( period( ii ) / 2 ) , 1 ) + 0.5*emb2( ii - round( period( ii ) / 2 ) + 1 , 1 ) ;
  
  endif
  
endfor ## end of embedding features creation

feature_peak = emb2 * [ 1 0 ; -1 1 ; 0 -1 ] * [ 1 ; 1 ] ;
feature_trough = zeros( size( feature_peak ) ) ;
ix = find( feature_peak < 0 ) ; 
feature_trough( ix ) = abs( feature_peak( ix ) ) ;
feature_peak( feature_peak <= 0 ) = 0 ;

## https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best
## Weibull distribution with shape = 219.68 and scale = 1.94 for turn +/- 1 bar
## Weibull distribution with shape = 85.88 and scale = 1.84 for turn +/- 2 bar
## This comes from bayesian testing of cutoff value of feature_peak/feature_trough for highs/lows +/- 1 bar.
## The function used to get these values is "bayes_train_cyclic_turn_prob_of_embedding.m" which calls
## "bayes_optim_of_cyclic_embedding_conv_function" in /home/dekalog/Documents/octave/turning_points
scale = 1.84 ; shape = 85.88 ;
prob_matrix = zeros( size( feature_peak , 1 ) , 3 ) ;
prob_matrix( : , 1 ) = wblcdf( feature_peak , scale , shape ) ;
prob_matrix( : , 2 ) = wblcdf( feature_trough , scale , shape ) ;
prob_matrix( : , 3 ) = 1 .- sum( prob_matrix( : , 1 : 2 ) , 2 ) ;

endfunction

which again is a work in progress.

This takes as input a price and a period vector and outputs features as per the plots in the above linked ideal cyclic tau post plus a matrix of probabilities for price action being at a cyclic high/low. This matrix is the result of Monte Carlo Bayesian Optimisation over ideal sine wave prices to get a cutoff value for the derived feature in the above function. The probability distribution used for this probability matrix is the Weibull distribution, which has been determined by following the routine outlined in this "how to determine which distribution fits my data best" forum post and using the R statistical software platform.

The hard coded values for the cutoff in the above code are from the results of optimisation on pure sine waves. As I write this post there are optimisation routines running on sine waves with 20 db noise added.

More in due course.

Wednesday, 30 October 2019

Weight Agnostic Neural Net Training

I have recently come across the idea of weight agnostic neural net training and have implemented a crude version of this combined with the recent work I have been doing on Taken's Theorem ( see my posts here, here and here ) and using the statistical mechanics approach to creating synthetic data.

Using the simple Octave function below with the Akaike Information Criterion as the minimisation objective

## Copyright (C) 2019 dekalog
## 
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see
## .

## -*- texinfo -*- 
## @deftypefn {} {@var{J} =} wann_training_of_cyclic_embedding()
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2019-10-26

function J = wann_training_of_cyclic_embedding( x )
global sample_features ; global sample_targets ;
epsilon = 1e-15 ; ## to ensure log() does not give out a nan

## get the parameters from input x
activation_funcs = floor( x( 1 : 5 ) ) ; ## get the activations, 1 == sigmoid, 2 == tanh, 3 == Lecun sigmoid
layer_size = floor( x( 6 : 10 ) ) ;

[ min_layer_size , ix_min ] = min( layer_size ) ;
  if( min_layer_size > 0 ) ## to be expected most of the time
    nn_depth = length( layer_size ) ;
  elseif( min_layer_size == 0 ) ## one layer has no nodes, hence limits depth of nn
    nn_depth = ix_min - 1 ;
  endif 

length_jj_loop = 25 ;
all_aic_values = zeros( length_jj_loop , 1 ) ;  
  
for jj = 1 : length_jj_loop 
  
  previous_layer_out = sample_features ;
  sum_of_k = 0 ;
    
  for ii = 1 : nn_depth
    
    new_weight_matrix = ones( size( previous_layer_out , 2 ) , layer_size( ii ) ) ./ sqrt( size( previous_layer_out , 2 ) ) ;
    sum_of_k = sum_of_k + numel( new_weight_matrix ) ;
    prior_to_activation_input = previous_layer_out * new_weight_matrix ;

    ## select the activation function 
    if( activation_funcs( ii ) == 1 ) ## sigmoid activation
      previous_layer_out = 1.0 ./ ( 1.0 .+ exp( -prior_to_activation_input ) ) ;
    elseif( activation_funcs( ii ) == 2 ) ## tanh activation
      previous_layer_out = tanh( prior_to_activation_input ) ;
    elseif( activation_funcs( ii ) == 3 ) ## lecun sigmoid activation
      previous_layer_out = sigmoid_lecun_m( prior_to_activation_input ) ;
    endif 
    
  endfor

  ## the final logistic output
  new_weight_matrix = ones( size( previous_layer_out , 2 ) , 1 ) ./ sqrt( size( previous_layer_out , 2 ) ) ;
  sum_of_k = sum_of_k + numel( new_weight_matrix ) ;
  final_output = previous_layer_out * new_weight_matrix ;
  final_output = 1.0 ./ ( 1.0 .+ exp( -final_output ) ) ;

  max_likelihood = sum( log( final_output .+ epsilon ) .* sample_targets + log( 1 .- final_output .+ epsilon ) .* ( 1 .- sample_targets ) ) ;
  
  ## get Akaike Information criteria
  all_aic_values( jj ) = 2 * sum_of_k - 2 * max_likelihood ;

endfor ## end of jj loop

J = mean( all_aic_values ) ;

endfunction

and the Octave interface of the Bayesopt Library I am currently iterating over different architectures ( up to 5 hidden layers deep with a max of 100 nodes per layer and using a choice of 3 hidden activations ) for a simple Logistic Regression model to predict turning points in different sets of statistical mechanics synthetic data using just features based on Taken's embedding.

More in due course.

Monday, 16 September 2019

The Ideal Tau for Time Series Embedding?

In my Preliminary Test Results of Time Series Embedding post I got a bit ahead of myself and mistakenly quoted the ideal embedding length (Tau) to be half a period for cyclic prices. This should actually have been a quarter of a period and I have now corrected my earlier post.

This post is about my further investigations, which uncovered the above, and is written in a logical progression rather than the order in which I actually did things.

It was important to look at the Tau value for ideal sinusoidal data to confirm the quarter period, which checked out because I was able to easily recreate the 2D Lissajous curves in the Novel Method for Topological Embedding of Time-Series Data paper. I then extended this approach to 3D to produce plots such as this one, which may require some explaining.

This is a 3D Phase space plot of adaptive Max Min Normalised "sinusoidal prices" normalised to fall in the range [ -1 +1 ] and adaptive to the known, underlying period of the sine wave. This was done for prices with and without a "trend." The circular orbits are for the pure sine wave, representing cyclic price action plotted in 3D as current price, price delayed by a quarter of a cycle (Tau 1) and by half a cycle (Tau 2). The mini, blue and red "constellations" outside the circular orbits are the same pure sine wave(s) with an uptrend and downtrend added. The utility of this representation for market classification etc. is obvious, but is not the focus of this post.

Looking at Tau values for ideal prices with various trends taken from real market data, normalised as described above, across a range of known periods (because they were synthetically generated) the following results were obtained.

This is a plot of periods 10 to 47 (left to right along the x-axis) vs the global tau value (y-axis) measured using my Octave version of the mdDelay.m function from the mdembedding github. Each individual, coloured line plot is based on the underlying trend of one of 66 different, real time series. Each point per period per tradeable is the median value of 20 Monte Carlo runs over that tradeable's trend + synthetic, ideal sine wave price. The thick, black line is the median of the median values per period. Although noisy, it can be seen that the median line(s) straddle the theoretical, quarter cycle Tau (0.25) quite nicely. It should be remembered that this is on normalised prices. For comparison, the below plot shows the same on the unnormalised prices.

where the Tau value starts at over 0.5 at a cyclic period of 10 at the left and then slowly decreases to just over 0.1 at period 47.

More in due course.

Pages