Dekalog Blog: October 2019

Wednesday 30 October 2019

Weight Agnostic Neural Net Training

I have recently come across the idea of weight agnostic neural net training and have implemented a crude version of this combined with the recent work I have been doing on Taken's Theorem ( see my posts here, here and here ) and using the statistical mechanics approach to creating synthetic data.

Using the simple Octave function below with the Akaike Information Criterion as the minimisation objective

## Copyright (C) 2019 dekalog
## 
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see
## .

## -*- texinfo -*- 
## @deftypefn {} {@var{J} =} wann_training_of_cyclic_embedding()
##
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2019-10-26

function J = wann_training_of_cyclic_embedding( x )
global sample_features ; global sample_targets ;
epsilon = 1e-15 ; ## to ensure log() does not give out a nan

## get the parameters from input x
activation_funcs = floor( x( 1 : 5 ) ) ; ## get the activations, 1 == sigmoid, 2 == tanh, 3 == Lecun sigmoid
layer_size = floor( x( 6 : 10 ) ) ;

[ min_layer_size , ix_min ] = min( layer_size ) ;
  if( min_layer_size > 0 ) ## to be expected most of the time
    nn_depth = length( layer_size ) ;
  elseif( min_layer_size == 0 ) ## one layer has no nodes, hence limits depth of nn
    nn_depth = ix_min - 1 ;
  endif 

length_jj_loop = 25 ;
all_aic_values = zeros( length_jj_loop , 1 ) ;  
  
for jj = 1 : length_jj_loop 
  
  previous_layer_out = sample_features ;
  sum_of_k = 0 ;
    
  for ii = 1 : nn_depth
    
    new_weight_matrix = ones( size( previous_layer_out , 2 ) , layer_size( ii ) ) ./ sqrt( size( previous_layer_out , 2 ) ) ;
    sum_of_k = sum_of_k + numel( new_weight_matrix ) ;
    prior_to_activation_input = previous_layer_out * new_weight_matrix ;

    ## select the activation function 
    if( activation_funcs( ii ) == 1 ) ## sigmoid activation
      previous_layer_out = 1.0 ./ ( 1.0 .+ exp( -prior_to_activation_input ) ) ;
    elseif( activation_funcs( ii ) == 2 ) ## tanh activation
      previous_layer_out = tanh( prior_to_activation_input ) ;
    elseif( activation_funcs( ii ) == 3 ) ## lecun sigmoid activation
      previous_layer_out = sigmoid_lecun_m( prior_to_activation_input ) ;
    endif 
    
  endfor

  ## the final logistic output
  new_weight_matrix = ones( size( previous_layer_out , 2 ) , 1 ) ./ sqrt( size( previous_layer_out , 2 ) ) ;
  sum_of_k = sum_of_k + numel( new_weight_matrix ) ;
  final_output = previous_layer_out * new_weight_matrix ;
  final_output = 1.0 ./ ( 1.0 .+ exp( -final_output ) ) ;

  max_likelihood = sum( log( final_output .+ epsilon ) .* sample_targets + log( 1 .- final_output .+ epsilon ) .* ( 1 .- sample_targets ) ) ;
  
  ## get Akaike Information criteria
  all_aic_values( jj ) = 2 * sum_of_k - 2 * max_likelihood ;

endfor ## end of jj loop

J = mean( all_aic_values ) ;

endfunction

and the Octave interface of the Bayesopt Library I am currently iterating over different architectures ( up to 5 hidden layers deep with a max of 100 nodes per layer and using a choice of 3 hidden activations ) for a simple Logistic Regression model to predict turning points in different sets of statistical mechanics synthetic data using just features based on Taken's embedding.

More in due course.

Saturday 12 October 2019

Another Method of Creating Synthetic Data

Over the years I have posted about several different methodologies for creating synthetic data and I have recently come across yet another one which readers may find useful.

One of my first posts was Creation of Synthetic Data, which essentially is a random scrambling of historic data for a single time series with an attempt to preserve some of the bar to bar dependencies based upon a bar's position in relation to upper and lower price envelopes, a la Bollinger Bands, although the code provided in this post doesn't actually use Bollinger Bands. Another post, Creating Synthetic Data Using the Fast Fourier Transform, randomly scrambles data in the frequency domain.

Rather than random scrambling of existing data, another approach is to take measurements from existing data and then use these measurements to recreate new data series with similar characteristics. My Hidden Markov Modelling of Synthetic Periodic Time Series Data post utilises this approach and can be used to superimpose known sinusoidal waveforms onto historical trends. The resultant synthetic data can be used, as I have used it, in a form of controlled experiment whereby indicators etc. can be measured against known cyclic prices.

All of the above share the fact that they are applied to univariate time series only, although I have no doubt that they could probably be extended to the multivariate case. However, the new methodology I have come across is Statistical Mechanics of Time Series and its matlab central file exchange "toolbox." My use of this is to produce ensembles of my Currency Strength Indicator, from which I am now able to produce 50/60+ separate, synthetic time series representing, for example, all the Forex pairs, and preserving their inter-relationships whilst only suffering the computational burden of applying this methodology to a dozen or so underlying "fundamental" time series.

This may very well become my "go to" methodology for generating unlimited amounts of training data.

Tuesday 1 October 2019

Ideal Cyclic Tau Embedding as Times Series Features

Continuing on from my Ideal Tau for Time Series Embedding post, I have now written an Octave function based on these ideas to produce features for time series modelling. The function outputs are two slightly different versions of features, examples of which are shown in the following two plots, which show up and down trends in black, following a sinusoidal sideways market partially visible to the left:

and

The function outputs are normalised price and price delayed by a quarter and a half of the cycle period (in this case 20.) The trend slopes have been chosen to exemplify the difference in the features between up and down trends. The function outputs are identical for the unseen cyclic prices to the left.

When the sets of function outputs are plotted as 3D phase space plots typical results are

with the green, blue and red phase trajectories corresponding to the cyclic, up trending and down trending portions of the above synthetic price series. The markers in this plot correspond to points in the phase trajectories at which turns in the underlying price series occur. The following plot is the above plot rotated in phase space such that the green cyclic price phase trajectory is horizontally orientated.

It is clearer in this second phase space plot that there is a qualitative difference in the phase space trajectories of the different, underlying market types.

As a final check of feature relevance I used the Boruta R package, with the above turning point markers as classification targets (a turning point vs. not a turning point,) to assess the utility of this approach in general. These tests were conducted on real price series and also on indices created by my currency strength methodology. In all such Boruta tests the features are deemed "relevant" up to a delay of five bars on daily data (I ceased the tests at the five bar mark because it was becoming tedious to carry on.)

In summary, therefore, it can be concluded that features derived using Taken's theorem are useful on financial time series provided that:

the underlying time series are normalised (detrended) and,
the embedding delay (Tau) is set to the theoretical optimum of a quarter (and multiples thereof) of the measured cyclic period of the time series.

Pages