Tuesday, 31 March 2015

Conditional Restricted Boltzmann Machine

I have recently been looking at using a Conditional Restricted Boltzmann Machine, and in particular Graham Taylor's thesis paper, Composable Distributed-State Models for High-Dimensional Time Series and the associated code available from here. My adaptation of this is very much a work in progress, but I think it holds great promise.

Instead of using the "label" of my MFE/MAE indicator as determined by the matches found by my Cauchy Schwarz matching algorithm, I'm thinking of using the above CRBM to directly model the price action that immediately follows said matches, use the generative ability of the CRBM to create a set of modelled price bars and then apply the MFE/MAE indicator to these modelled price bars. In this way, I should be able to generate a sort of distribution of future MFE/MAE values as input for making trading decisions.

In my adaptation of the above linked code I have replaced the somewhat complicated motion capture input data with a very simple set of 5 features of the log changes in price between consecutive bars, and also made the code easier to read by spacing it out, using underscores to separate compound word named variables, added some extra comments and renamed some variables to be more relevant to the intended use. I have also split the code into separately callable functions, with an eye on future coding into .oct C++ functions. However, in essence, the code is much the same as is downloadable from the above code link. The code blocks below show my code adaptations so far:-
first, the basic training file
clear all

% load price file of interest
filename = input( 'Enter filename for prices, e.g. es or esmatrix: ' , 's' ) ;
data = load( "-ascii" , filename ) ;

% get tick size
switch filename

case { "cc" }
tick = 1 ;

case { "gc" "lb" "pl" "sm" "sp" }
tick = 0.1 ;

case { "ausyen" "bo" "cl" "ct" "dx" "euryen" "gbpyen" "sb" "usdyen" }
tick = 0.01 ;

case { "ng" }
tick = 0.001 ;

case { "auscad" "aususd" "euraus" "eurcad" "eurchf" "eurgbp" "eurusd" "gbpchf" "gbpusd" "ho" "rb" "usdcad" "usdchf" }
tick = 0.0001 ;

case { "c" "o" "s" "es" "nd" "w" }
tick = 0.25 ;

case { "fc" "lc" "lh" "pb" }
tick = 0.025 ;

case { "ed" }
tick = 0.0025 ;

case { "si" }
tick = 0.5 ;

case { "hg" "kc" "oj" "pa" }
tick = 0.05 ;

case { "ty" "us" }
tick = 0.015625 ;

case { "ccmatrix" }
tick = 1 ;

case { "gcmatrix" "lbmatrix" "plmatrix" "smmatrix" "spmatrix" }
tick = 0.1 ;

case { "ausyenmatrix" "bomatrix" "clmatrix" "ctmatrix" "dxmatrix" "euryenmatrix" "gbpyenmatrix" "sbmatrix" "usdyenmatrix" }
tick = 0.01 ;

case { "ngmatrix" }
tick = 0.001 ;

case { "auscadmatrix" "aususdmatrix" "eurausmatrix" "eurcadmatrix" "eurchfmatrix" "eurgbpmatrix" "eurusdmatrix" "gbpchfmatrix" "gbpusdmatrix" "homatrix" "rbmatrix" "usdcadmatrix" "usdchfmatrix" }
tick = 0.0001 ;

case { "cmatrix" "omatrix" "smatrix" "esmatrix" "ndmatrix" "wmatrix" }
tick = 0.25 ;

case { "fcmatrix" "lcmatrix" "lhmatrix" "pbmatrix" }
tick = 0.025 ;

case { "edmatrix" }
tick = 0.0025 ;

case { "simatrix" }
tick = 0.5 ;

case { "hgmatrix" "kcmatrix" "ojmatrix" "pamatrix" }
tick = 0.05 ;

case { "tymatrix" "usmatrix" }
tick = 0.015625 ;

endswitch

%  get data
open = data( : , 4 ) ;
high = data( : , 5 ) ;
low = data( : , 6 ) ;
close = data( : , 7 ) ;

clear data

%  create log series for crbm training
vwap = vwap_all_period_detrend( open , high , low , close , tick ) ;
open_from_vwap = log( open ./ shift( vwap , 1 ) ) ; open_from_vwap(1) = 0 ;
open_from_previous_close = log( open ./ shift( close , 1 ) ) ; open_from_previous_close(1) = 0 ;
high_from_vwap = log( high ./ vwap ) ; high_from_vwap(1) = 0 ;
low_from_vwap = log( low ./ vwap ) ; low_from_vwap(1) = 0 ;
close_from_vwap = log( close ./ vwap ) ; close_from_vwap(1) = 0 ;

%  assemble into a features matrix
features = [ open_from_vwap open_from_previous_close high_from_vwap low_from_vwap close_from_vwap ] ;

%  now prepare data for the NN training
%  visually inspect chart to choose a training target
pkg load financial
clf () ;
figure(1) ; highlow( high , low , close , open ) ;
training_point_index = input( 'Choose a training point for NN training by entering an index from figure 1: ' ) ;
bar_number = 0 ;

%  load the pre-computed top 500 Cauchy Schwarz Cross validation index matches
load eurusd_top_500_cv_matches

%  select how many of the above matches to use - an optimisable parameter via pso?
number_of_training_examples_to_use = 450 ;

for ii = 0 : 2
%  bar_number = input( 'Enter a 1, 2 or 3 to identify the bar sequence order' ) ;
bar_number = bar_number + 1 ;

% using training_point_index get the top number_of_training_examples_to_use matches
features_index = eurusd_top_500_cv_matches( training_point_index+ii , 1 : number_of_training_examples_to_use )' ;

%  use the above features_index to create batchdata for training, assembled into form suitable for crbm training
%  how-many timesteps do we look back for directed connections - this is what we call the "order" of the model 
n1 = 3 ; % first layer
n2 = 3 ; % second layer

%  create batchdataindex to index bars that include bars at timesteps t-2, t-1, t and t+1 where t is the bar at features_index
batchdataindex = [ features_index ; features_index.- 2 ; features_index.-1 ; features_index.+1 ] ;
%  sorted_batchdata_index = sort( batchdataindex ) ;

%  shuffle training set by shuffling the index values
batchdataindex = batchdataindex( randperm( size( batchdataindex , 1 ) ) , : ) ;

%  normalise the features matrix by the mean and std of just the batchdataindex values in features matrix
data_mean = mean( features( batchdataindex ) , 1 ) ;
data_std = std( features( batchdataindex ) ) ;
batchdata = ( features - repmat( data_mean , size( features , 1 ) , 1 ) ) ./ repmat( data_std , size( features , 1 ) , 1 ) ;

if ( bar_number == 1 )
data_mean_1 = data_mean ; data_std_1 = data_std ;
save bar_1_mean_std.mat data_mean_1 data_std_1 ;
elseif ( bar_number == 2 )
data_mean_2 = data_mean ; data_std_2 = data_std ;
save bar_2_mean_std.mat data_mean_2 data_std_2 ;
elseif ( bar_number == 3 )
data_mean_3 = data_mean ; data_std_3 = data_std ;
save bar_3_mean_std.mat data_mean_3 data_std_3 ;
else
fprintf( 'Have entered wrong value for bar number.\n' ) ;
end

%  now create minibatches from the batchdataindex in the form of a cell array
batchsize = 100 ;
minibatchindex = reshape( batchdataindex , size( batchdataindex , 1 ) / batchsize , batchsize ) ;
minibatch = num2cell( minibatchindex , 2 ) ;

%  Set network properties
numdims = size( features , 2 ) ;
%  numhid1 = 150 ; numhid2 = 150 ; numepochs = 2000 ; % original
numhid1 = 100 ; numhid2 = 100 ; numepochs = 2000 ;
gsd = 1 ;          % fixed standard deviation for Gaussian units
nt = n1 ;          % crbm "order"
numhid = numhid1 ;
%  fprintf( 1 , 'Training Layer 1 CRBM, order %d: %d-%d \n', nt , numdims , numhid ) ;
restart = 1 ;      % initialize weights
tic() ;
[ w1 , bj1 , bi1 , A1 , B1 ] = gaussian_crbm( batchdata , minibatch , nt , gsd , numepochs , numhid , restart ) ;
gaussian_crbm_training_time = toc()

if ( bar_number == 1 )
w11 = w1 ; bj11 = bj1 ; bi11 = bi1 ; A11 = A1 ; B11 = B1 ;
save layer1.mat w11 bj11 bi11 A11 B11 ;
elseif ( bar_number == 2 )
w12 = w1 ; bj12 = bj1 ; bi12 = bi1 ; A12 = A1 ; B12 = B1 ;
save layer1_2.mat w12 bj12 bi12 A12 B12 ;
elseif ( bar_number == 3 )
w13 = w1 ; bj13 = bj1 ; bi13 = bi1 ; A13 = A1 ; B13 = B1 ;
save layer1_3.mat w13 bj13 bi13 A13 B13 ;
else
fprintf( 'Have entered wrong value for bar number.\n' ) ;
end

tic() ;
[ filtering_distance ] = get_filtering_distance( batchdata , unique( batchdataindex ) , numhid , w1 , bj1 , n1 , n2 , B1 , gsd ) ;
get_filtering_distance_timing = toc()

%  now use filtering_distance as new features for training of second layer
%  first create a new batchdata set
batchdata_2 = zeros( size( features , 1 ) , size( filtering_distance , 2 ) ) ;
%  and fill with the relevant, indexed values from filtering_distance
batchdata_2( unique( batchdataindex ) , : ) = filtering_distance ;

%  create new minibatch to index batchdata_2
batchsize = 25 ;
minibatchindex = reshape( features_index .+ 1 , size( features_index , 1 ) / batchsize , batchsize ) ;
minibatch = num2cell( minibatchindex , 2 ) ;

%  set network properties for the 2nd layer training
numhid = numhid2 ; nt = n2 ;
numdims = size( batchdata_2 , 2 ) ; % data (visible) dimension
%  fprintf( 1 , 'Training Layer 2 CRBM, order %d: %d-%d \n' , nt , numdims , numhid ) ;
restart = 1 ; % initialize weights
tic() ;
[ w2 , bj2 , bi2 , A2 , B2 ] = binary_crbm( batchdata_2 , minibatch , nt , numepochs , numhid , restart ) ;
binary_crbm_training_time = toc()

if ( bar_number == 1 )
w21 = w2 ; bj21 = bj2 ; bi21 = bi2 ; A21 = A2 ; B21 = B2 ;
save layer2.mat w21 bj21 bi21 A21 B21 ;
elseif ( bar_number == 2 )
w22 = w2 ; bj22 = bj2 ; bi22 = bi2 ; A22 = A2 ; B22 = B2 ;
save layer2_2.mat w22 bj22 bi22 A22 B22 ;
elseif ( bar_number == 3 )
w23 = w2 ; bj23 = bj2 ; bi23 = bi2 ; A23 = A2 ; B23 = B2 ;
save layer2_3.mat w23 bj23 bi23 A23 B23 ;
else
fprintf( 'Have entered wrong value for bar number.\n' ) ;
end

end % end of training_point_index loop
which in turn calls gaussian_crbm.m
function [ w , bj , bi , A , B ] = gaussian_crbm( batchdata , minibatch , nt , gsd , num_epochs , num_hid , restart )

%  Version 1.01 
%  %
%  Code provided by Graham Taylor, Geoff Hinton and Sam Roweis 
%  %
%  For more information, see:
%     http://www.cs.toronto.edu/~gwtaylor/publications/nips2006mhmublv
%  %
%  Permission is granted for anyone to copy, use, modify, or distribute this
%  program and accompanying programs and documents for any purpose, provided
%  this copyright notice is retained and prominently displayed, along with
%  a note saying that the original programs are available from our
%  web page.
%  The programs and documents are distributed without any warranty, express or
%  implied.  As the programs were written for research purposes only, they have
%  not been tested to the degree that would be advisable in any important
%  application.  All use of these programs is entirely at the user's own risk.
%  
%  This program trains a Conditional Restricted Boltzmann Machine in which
%  visible, Gaussian-distributed inputs are connected to
%  hidden, binary, stochastic feature detectors using symmetrically
%  weighted connections. Learning is done with 1-step Contrastive Divergence.
%  Directed connections are present, from the past nt configurations of the
%  visible units to the current visible units (A), and the past nt
%  configurations of the visible units to the current hidden units (B)
%  
%  The program assumes that the following variables are set externally:
%  nt        -- order of the model
%  gsd       -- fixed standard deviation of Gaussian visible units
%  num_epochs -- maximum number of epochs
%  num_hid    --  number of hidden units 
%  batchdata --  a matrix of data (num_cases,num_dims) 
%  minibatch -- a cell array of dimension batchsize, indexing the valid
%  frames in batchdata
%  restart   -- set to 1 if learning starts from beginning 

%  batchdata is a big matrix of all the frames
%  we index it with "minibatch", a cell array of mini-batch indices
num_batches = length( minibatch ) ; 

num_dims = size( batchdata , 2 ) ; % visible dimension

%  Setting learning rates
epsilon_w = 1e-3 ; % undirected
epsilon_bi = 1e-3 ; % visibles
epsilon_bj = 1e-3 ; % hidden units
epsilon_A = 1e-3 ; % autoregressive
epsilon_B = 1e-3 ; % prev visibles to hidden

w_decay = 0.0002 ; % currently we use the same weight decay for w, A, B
mom = 0.9 ;       % momentum used only after 5 epochs of training

if ( restart == 1 )  
    restart = 0 ;
    epoch = 1 ;

    % Randomly initialize weights
    w = 0.01 * randn( num_hid , num_dims ) ;
    bi = 0.01 * randn( num_dims , 1 ) ;
    bj = -1 + 0.01 * randn( num_hid , 1 ) ; % set to favour units being "off"

    % The autoregressive weights; A(:,:,j) is the weight from t-j to the vis
    A = 0.01 * randn( num_dims ,num_dims ,nt ) ;

    % The weights from previous time-steps to the hiddens; B(:,:,j) is the
    % weight from t-j to the hidden layer
    B = 0.01 * randn( num_hid , num_dims , nt ) ;

    clear w_grad bi_grad bj_grad A_grad B_grad
    clear neg_w_grad neg_bi_grad neg_bj_grad neg_A_grad neg_B_grad

    % keep previous updates around for momentum
    w_update = zeros( size( w ) ) ;
    bi_update = zeros( size( bi ) ) ;
    bj_update = zeros( size( bj ) ) ;
    A_update = zeros( size( A ) ) ;
    B_update = zeros( size( B ) ) ;
    
end % end of restart if

% Main loop
for epoch = epoch : num_epochs

  errsum = 0 ; % keep a running total of the difference between data and recon
  
  for batch = 1 : num_batches     
   
%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
    num_cases = length( minibatch{ batch } ) ;
    mb = minibatch{ batch } ; % caches the indices   

    % data is a nt+1-d array with current and delayed data
    % corresponding to this mini-batch
    data = zeros( num_cases , num_dims , nt + 1 ) ;
    data( : , : , 1 ) = batchdata( mb , : ) ;
    for hh = 1 : nt
        data( : , : , hh + 1 ) = batchdata( mb - hh , : ) ;
    end
     
    % calculate contributions from directed autoregressive connections
    bi_star = zeros( num_dims , num_cases ) ; 
    for hh = 1 : nt       
        bi_star = bi_star +  A( : , : , hh ) * data( : , : , hh + 1 )' ;
    end   
    
    % Calculate contributions from directed visible-to-hidden connections
    bj_star = zeros( num_hid , num_cases ) ;
    for hh = 1:nt
        bj_star = bj_star + B( : , : , hh ) * data( : , : , hh + 1 )' ;
    end
      
    % Calculate "posterior" probability -- hidden state being on 
    % N ote that it isn't a true posterior   
    eta = w * ( data( : , : , 1 ) ./ gsd )' + ... % bottom-up connections
          repmat( bj , 1 , num_cases ) + ...      % static biases on unit
          bj_star ;                               % dynamic biases
    
    h_posteriors = 1 ./ ( 1 + exp( -eta ) ) ;    % logistic
    
    % Activate the hidden units    
    hid_states = double( h_posteriors' > rand( num_cases , num_hid ) ) ; 
    
    % Calculate positive gradients (note w.r.t. neg energy)
    w_grad = hid_states' * ( data( : , : , 1 ) ./ gsd ) ;
    bi_grad = sum( data( : , : , 1 )' - repmat( bi , 1 , num_cases ) - bi_star , 2 ) ./ gsd^2 ;
    bj_grad = sum( hid_states , 1 )' ;
           
    for hh = 1 : nt      
        A_grad( : , : , hh ) = ( data( : , : , 1 )' - repmat( bi , 1 , num_cases ) - bi_star ) ./ gsd^2 * data( : , : , hh + 1 ) ;
        B_grad( : , : , hh ) = hid_states' * data( : , : , hh + 1 ) ;      
    end
    
%%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
    % Activate the visible units
    % Find the mean of the Gaussian
    topdown = gsd .* ( hid_states * w ) ;

    % This is the mean of the Gaussian 
    % Instead of properly sampling, negdata is just the mean
    % If we want to sample from the Gaussian, we would add in
    % gsd .* randn( num_cases , num_dims ) ;
    negdata = topdown + ...                       % top down connections
              repmat( bi' , num_cases , 1 ) + ... % static biases
              bi_star' ;                          % dynamic biases

    %N ow conditional on negdata, calculate "posterior" probability
    % for hiddens
    eta = w * ( negdata ./ gsd )' + ...      % bottom-up connections
          repmat( bj , 1 , num_cases ) + ... % static biases on unit (no change)
          bj_star ;                          % dynamic biases (no change)

    h_posteriors = 1 ./ ( 1 + exp( -eta ) ) ; % logistic
    
    % Calculate negative gradients
    neg_w_grad = h_posteriors * ( negdata ./ gsd ) ; % not using activations
    neg_bi_grad = sum( negdata' - repmat( bi , 1 , num_cases ) - bi_star , 2 ) ./ gsd^2 ;
    neg_bj_grad = sum( h_posteriors , 2 ) ;
    
    for hh = 1 : nt
        neg_A_grad( : , : , hh ) = ( negdata' - repmat( bi , 1 , num_cases ) - bi_star ) ./ gsd^2 * data( : , : , hh + 1 ) ;
        neg_B_grad( : , : , hh ) = h_posteriors * data( : , : , hh + 1 ) ;        
    end

%%%%%%%%% END NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
    
    err = sum( sum( ( data( : , : , 1 ) - negdata ) .^2 ) ) ;
    errsum = err + errsum ;
    
    if ( epoch > 5 ) % use momentum
        momentum = mom ;
    else % no momentum
        momentum = 0 ;
    end
    
%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%        
    
    w_update = momentum * w_update + epsilon_w * ( ( w_grad - neg_w_grad ) / num_cases - w_decay * w ) ;
    bi_update = momentum * bi_update + ( epsilon_bi / num_cases ) * ( bi_grad - neg_bi_grad ) ;
    bj_update = momentum * bj_update + ( epsilon_bj / num_cases ) * ( bj_grad - neg_bj_grad ) ;

    for hh = 1 : nt
        A_update( : , : , hh ) = momentum * A_update( : , : , hh ) + epsilon_A * ( ( A_grad( : , : , hh ) - neg_A_grad( : , : , hh ) ) / num_cases - w_decay * A( : , : , hh ) ) ;
        B_update( : , : , hh ) = momentum * B_update( : , : , hh ) + epsilon_B * ( ( B_grad( : , : , hh ) - neg_B_grad( : , : , hh ) ) / num_cases - w_decay * B( : , : , hh ) ) ;
    end

    w = w + w_update ;
    bi = bi + bi_update ;
    bj = bj + bj_update ;

    for hh = 1 : nt
        A( : , : , hh ) = A( : , : , hh ) + A_update( : , : , hh ) ;
        B( : , : , hh ) = B( : , : , hh ) + B_update( : , : , hh ) ;
    end
    
%%%%%%%%%%%%%%%% END OF UPDATES  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
  
  end
    
%    % every 10 epochs, show output
%    if ( mod( epoch , 10 ) == 0 )
%       fprintf( 1 , 'epoch %4i error %6.1f  \n', epoch , errsum ) ; 
%       % Could see a plot of the weights every 10 epochs
%       % figure(3); weightreport
%       % drawnow;
%    end
    
end % end of epoch for loop
    
end % end of function
get_filtering_distance.m
function [ filtering_distance ] = get_filtering_distance( batchdata , batchdataindex , num_hid , w , bj , n1 , n2 , B , gsd )

% Version 1.01 
%
% Code provided by Graham Taylor, Geoff Hinton and Sam Roweis 
%
% For more information, see:
%     http://www.cs.toronto.edu/~gwtaylor/publications/nips2006mhmublv
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program runs the entire training set on the trained 1-hidden-layer
% network and saves the filtering distribution vectors in mini-batch format
% This is done before training a second CRBM on top of the first

% The program assumes that the following variables are set externally:
% n2    -- order of the next layer CRBM

%  batchsize = 100 ; % size of mini-batches

%  take all valid examples ( indexed by batchdataindex )
num_cases = length( batchdataindex ) ;

%  Calculate contributions from directed visible-to-hidden connections
bj_star = zeros( num_hid , num_cases ) ;

for hh = 1 : n1

  bj_star = bj_star + B( : , : , hh ) * batchdata( batchdataindex - hh , : )' ;
  
end % end for loop

%  Calculate "posterior" probability -- hidden state being on
%  Note that it isn't a true posterior
bottom_up = w * ( batchdata( batchdataindex , : ) ./ gsd )' ;

eta = bottom_up + ...                    % bottom-up connections
      repmat( bj , 1 , num_cases ) + ... % static biases on unit
      bj_star ;                          % dynamic biases

filtering_distance = 1 ./ ( 1 + exp( -eta' ) ) ; % logistic

end % end of function
binary_crbm.m
function [ w , bj , bi , A , B ] = binary_crbm( input_data , minibatch , nt , num_epochs , num_hid , restart )

% Version 1.01 
%
% Code provided by Graham Taylor, Geoff Hinton and Sam Roweis 
%
% For more information, see:
%     http://www.cs.toronto.edu/~gwtaylor/publications/nips2006mhmublv
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program trains a Conditional Restricted Boltzmann Machine in which
% visible, binary, stochastic inputs are connected to
% hidden, binary, stochastic feature detectors using symmetrically
% weighted connections. Learning is done with 1-step Contrastive Divergence.
% Directed connections are present, from the past nt configurations of the
% visible units to the current visible units (A), and the past nt
% configurations of the visible units to the current hidden units (B)

% The program assumes that the following variables are set externally:
% nt        -- order of the model
% num_epochs -- maximum number of epochs
% num_hid    --  number of hidden units 
% input_data --  a matrix of data (num_cases,num_dims) 
% minibatch -- a cell array of dimension batchsize, indexing the valid
% frames in input_data
% restart   -- set to 1 if learning starts from beginning 

% input_data is a big matrix of all the frames
% we index it with "minibatch", a cell array of mini-batch indices
num_batches = length( minibatch ) ; 

num_dims = size( input_data , 2 ) ; % visible dimension

%  Setting learning rates
epsilon_w = 1e-3 ; % undirected
epsilon_bi = 1e-3 ; % visibles
epsilon_bj = 1e-3 ; % hidden units
epsilon_A = 1e-3 ; % autoregressive
epsilon_B = 1e-3 ; % prev visibles to hidden

w_decay = 0.0002 ; % currently we use the same weight decay for w, A, B
mom = 0.9 ;        % momentum used only after 5 epochs of training

if ( restart == 1 )  
   restart = 0 ;
   epoch = 1 ;
  
  % Randomly initialize weights
  w = 0.01 * randn( num_hid , num_dims ) ;
  bi = 0.01 * randn( num_dims , 1 ) ;
  bj = -1 + 0.01 * randn( num_hid , 1 ) ; % set to favour units being "off"
  
  % The autoregressive weights; A(:,:,j) is the weight from t-j to the vis
  A = 0.01 * randn( num_dims , num_dims , nt ) ;
 
  % The weights from previous time-steps to the hiddens; B(:,:,j) is the
  % weight from t-j to the hidden layer
  B = 0.01 * randn( num_hid , num_dims , nt ) ;
  
  clear w_grad bi_grad bj_grad A_grad B_grad
  clear neg_w_grad neg_bi_grad neg_bj_grad neg_A_grad neg_B_grad
  
  % keep previous updates around for momentum
  w_update = zeros( size( w ) ) ;
  bi_update = zeros( size( bi ) ) ;
  bj_update = zeros( size( bj ) ) ;
  A_update = zeros( size( A ) ) ;
  B_update = zeros( size( B ) ) ;
    
end % end of restart if

% Main loop
for epoch = epoch : num_epochs

  errsum = 0 ; % keep a running total of the difference between data and recon
  
  for batch = 1 : num_batches     

%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    num_cases = length( minibatch{ batch } ) ;
    mb = minibatch{ batch } ; % caches the indices   

    % data is a nt+1-d array with current and delayed data
    % corresponding to this mini-batch
    data = zeros( num_cases , num_dims , nt + 1 ) ;
    data( : , : , 1 ) = input_data( mb , : ) ;
    for hh = 1 : nt
        data( : , : , hh + 1 ) = input_data( mb - hh , : ) ;
    end

    % Calculate contributions from directed autoregressive connections
    bi_star = zeros( num_dims , num_cases ) ; 
    for hh = 1 : nt       
        bi_star = bi_star +  A( : , : , hh ) * data( : , : , hh + 1 )' ;
    end   
    
    % Calculate contributions from directed visible-to-hidden connections
    bj_star = zeros( num_hid , num_cases ) ;
    for hh = 1 : nt
        bj_star = bj_star + B( : , : , hh ) * data( : , : , hh + 1 )' ;
    end
     
    bottom_up = w * data( : , : , 1 )' ;
    
    % Calculate "posterior" probability -- hidden state being on 
    % Note that it isn't a true posterior   
    eta = bottom_up + ...                    % bottom-up connections
          repmat( bj , 1 , num_cases ) + ... % static biases on unit
          bj_star ;                          % dynamic biases
    
    h_posteriors = 1 ./ ( 1 + exp( -eta ) ) ; % logistic
       
    % Activate the hidden units    
    hid_states = h_posteriors' > rand( num_cases , num_hid ) ; 
    
    % Calculate positive gradients (note w.r.t. neg energy)
    w_grad = hid_states' * data( : , : , 1 ) ;
    bi_grad = sum( data( : , : , 1 )' - repmat( bi , 1 , num_cases ) - bi_star , 2 ) ;
    bj_grad = sum( hid_states , 1 )' ;
           
    for hh = 1 : nt      
        A_grad( : , : , hh ) = ( data( : , : , 1 )' - repmat( bi , 1 , num_cases ) - bi_star ) * data( : , : , hh + 1 ) ;
        B_grad( : , : , hh ) = hid_states' * data( : , : , hh + 1 ) ;      
    end
    
%%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%% START NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
    
    % Activate the visible units
    topdown = hid_states * w ;

    eta = topdown + ...                       % top down connections
          repmat( bi' , num_cases , 1 ) + ... % static biases
          bi_star' ;                          % dynamic biases
    
    negdata = 1 ./ ( 1 + exp( -eta ) ) ; % logistic
    
    % Now conditional on negdata, calculate "posterior" probability
    % for hiddens
    bottom_up = w * negdata' ;
    eta = bottom_up + ...                    % bottom-up connections
          repmat( bj , 1 , num_cases ) + ... % static biases on unit (no change)
          bj_star ;                          % dynamic biases (no change)

    h_posteriors = 1 ./ ( 1 + exp( -eta ) ) ; %logistic
    
    % Calculate negative gradients
    neg_w_grad = h_posteriors * negdata ; % not using activations
    neg_bi_grad = sum( negdata' - repmat( bi , 1 , num_cases ) - bi_star , 2 ) ;
    neg_bj_grad = sum( h_posteriors , 2 ) ;
    
    for hh = 1 : nt
        neg_A_grad( : , : , hh ) = ( negdata' - repmat( bi , 1 , num_cases ) - bi_star ) * data( : , : , hh + 1 ) ;
        neg_B_grad( : , : , hh ) = h_posteriors * data( : , : , hh + 1 ) ;        
    end
   
%%%%%%%%% END NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    err = sum( sum( ( data( : , : , 1 ) - negdata ) .^2 ) ) ;
    errsum = err + errsum ;
    
    if ( epoch > 5 ) % use momentum
        momentum = mom ;
    else % no momentum
        momentum = 0 ;
    end
    
%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    w_update = momentum * w_update + epsilon_w * ( ( w_grad - neg_w_grad ) / num_cases - w_decay * w ) ;
    bi_update = momentum * bi_update + ( epsilon_bi / num_cases ) * ( bi_grad - neg_bi_grad ) ;
    bj_update = momentum * bj_update + ( epsilon_bj / num_cases ) * ( bj_grad - neg_bj_grad ) ;

    for hh = 1 : nt
        A_update( : , : , hh ) = momentum * A_update( : , : , hh ) + epsilon_A * ( ( A_grad( : , : , hh ) - neg_A_grad( : , : , hh ) ) / num_cases - w_decay * A( : , : , hh ) ) ;
        B_update( : , : , hh ) = momentum * B_update( : , : , hh ) + epsilon_B * ( ( B_grad( : , : , hh ) - neg_B_grad( : , : , hh ) ) / num_cases - w_decay * B( : , : , hh ) ) ;
    end

    w = w +  w_update ;
    bi = bi + bi_update ;
    bj = bj + bj_update ;

    for hh = 1 : nt
        A( : , : , hh ) = A( : , : , hh ) + A_update( : , : , hh ) ;
        B( : , : , hh ) = B( : , : , hh ) + B_update( : , : , hh ) ;
    end
    
%%%%%%%%%%%%%%%% END OF UPDATES  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
  end
    
%    % every 10 epochs, show output
%    if ( mod( epoch , 10 ) == 0 )
%        fprintf( 1 , 'epoch %4i error %6.1f  \n' , epoch , errsum ) ; 
%        % Could see a plot of the weights every 10 epochs
%        % figure(3); weightreport
%        % drawnow;
%    end
    
end % end main loop
    
end % end of function
generates new data with generate_visible_data.m
function [ visible_data  ] = generate_visible_data( init_data , hidden_1 , num_hid_1 , num_hid_2 , w1 , w2 , bj1 , bi1 , bj2 , bi2 , A1 , A2 , B1 , B2 , n1 , n2 , gsd )

% Version 1.01 
%
% Code provided by Graham Taylor, Geoff Hinton and Sam Roweis 
%
% For more information, see:
%     http://www.cs.toronto.edu/~gwtaylor/publications/nips2006mhmublv
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.
%
% This program uses the 2-level CRBM to generate data
% More efficient version than the original

num_frames = 1 ;
max_clamped = n1 + n2 ;
A2_flat = reshape( A2 , num_hid_1 , n2 * num_hid_1 ) ;
B2_flat = reshape( B2 , num_hid_2 , n2 * num_hid_1 ) ;

num_Gibbs = 30 ; % number of alternating Gibbs iterations

%  initialize second layer ( first n1 + n2 frames padded )
hidden_2 = ones( num_frames , num_hid_2 ) ;

%  keep the recent past in vector form
%  for input to directed links
%  it's slightly faster to pre-compute this vector and update it ( by
%  shifting ) at each time frame, rather than to fully recompute each time
%  frame
past = reshape( hidden_1( max_clamped : -1 : max_clamped + 1 - n2 , : )' , num_hid_1 * n2 , 1 ) ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  First generate a hidden sequence (top layer)
%  Then go down through the first CRBM
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

fprintf( 'Generating hidden states\n' ) ;

%  for tt = max_clamped + 1 : num_frames
  tt = size( hidden_1 , 1 ) ;

  % initialize using the last frame
  % noise is not added for binary units
  hidden_1( tt , : ) = hidden_1( tt - 1 , : ) ;
  
  % Dynamic biases aren't re-calculated during Alternating Gibbs
  bi_star = A2_flat * past ;
  bj_star = B2_flat * past ;

  % Gibbs sampling
  for gg = 1 : num_Gibbs
  
    % Calculate posterior probability -- hidden state being on (estimate)
    % add in bias
    bottomup = w2 * hidden_1( tt , : )' ;
               eta = bottomup + ...       % bottom-up connections
               bj2 + ...                  % static biases on unit
               bj_star ;                  % dynamic biases
    
    hposteriors = 1 ./ ( 1 + exp( -eta ) ) ; % logistic
    
    hidden_2( tt , : ) = double( hposteriors' > rand( 1 , num_hid_2 ) ) ; % Activating hiddens
    
    % Downward pass; visibles are binary logistic units     
    topdown = hidden_2( tt , : ) * w2 ;
        
    eta = topdown + ... % top down connections
          bi2' + ...    % static biases
          bi_star';     % dynamic biases   
    
    hidden_1( tt , : ) = 1 ./ ( 1 + exp( -eta ) ) ; % logistic 
    
  end % end gibbs sampling

  % If we are done Gibbs sampling, then do a mean-field sample
  
  topdown = hposteriors' * w2 ; % Very noisy if we don't do mean-field here

  eta = topdown + ... % top down connections
        bi2' + ...    % static biases
        bi_star';     % dynamic biases
        hidden_1( tt , : ) = 1 ./ ( 1 + exp( -eta ) ) ;

  % update the past
%    past( num_hid_1 + 1 : end ) = past( 1 : end - num_hid_1 ) ; % shift older history down
%    past( 1 : num_hid_1 ) = hidden_1( tt , : ) ;                % place most recent frame at top
  
  
%    if ( mod( tt , 10 ) == 0 )
%       fprintf( 'Finished frame %d\n' , tt ) ;
%    end
  
%  end % end tt loop

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Now that we've decided on the "filtering distribution", generate visible
%  data through CRBM 1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

fprintf( 'Generating visible data\n' ) ;

%  for tt = max_clamped + 1 : num_frames

    % Add contributions from autoregressive connections
%      bi_star = zeros( num_dims , 1 ) ; % original
    bi_star = zeros( size( init_data , 2 ) , 1 ) ;
    for hh = 1 : n1
%          bi_star = bi_star + A1( : , : , hh ) * visible( tt - hh , : )' ; % original
        bi_star = bi_star + A1( : , : , hh ) * init_data( tt - hh , : )' ;
    end

    % Mean-field approx; visible units are Gaussian
    % ( filtering distribution is the data we just generated )
    topdown = gsd .* ( hidden_1( tt , : ) * w1 ) ;
%      visible( tt , : ) = topdown + ... % top down connections ( original )
    visible_data = topdown + ...
                   bi1' + ...    % static biases
                   bi_star' ;    % dynamic biases
                        
%  end % end tt loop

end % end of function
and then finally plots this new data as price bars, crbm_visual_test.m
clear all

% load price file of interest
filename = input( 'Enter filename for prices, e.g. es or esmatrix: ' , 's' ) ;
data = load( "-ascii" , filename ) ;

% get tick size
switch filename

case { "cc" }
tick = 1 ;

case { "gc" "lb" "pl" "sm" "sp" }
tick = 0.1 ;

case { "ausyen" "bo" "cl" "ct" "dx" "euryen" "gbpyen" "sb" "usdyen" }
tick = 0.01 ;

case { "ng" }
tick = 0.001 ;

case { "auscad" "aususd" "euraus" "eurcad" "eurchf" "eurgbp" "eurusd" "gbpchf" "gbpusd" "ho" "rb" "usdcad" "usdchf" }
tick = 0.0001 ;

case { "c" "o" "s" "es" "nd" "w" }
tick = 0.25 ;

case { "fc" "lc" "lh" "pb" }
tick = 0.025 ;

case { "ed" }
tick = 0.0025 ;

case { "si" }
tick = 0.5 ;

case { "hg" "kc" "oj" "pa" }
tick = 0.05 ;

case { "ty" "us" }
tick = 0.015625 ;

case { "ccmatrix" }
tick = 1 ;

case { "gcmatrix" "lbmatrix" "plmatrix" "smmatrix" "spmatrix" }
tick = 0.1 ;

case { "ausyenmatrix" "bomatrix" "clmatrix" "ctmatrix" "dxmatrix" "euryenmatrix" "gbpyenmatrix" "sbmatrix" "usdyenmatrix" }
tick = 0.01 ;

case { "ngmatrix" }
tick = 0.001 ;

case { "auscadmatrix" "aususdmatrix" "eurausmatrix" "eurcadmatrix" "eurchfmatrix" "eurgbpmatrix" "eurusdmatrix" "gbpchfmatrix" "gbpusdmatrix" "homatrix" "rbmatrix" "usdcadmatrix" "usdchfmatrix" }
tick = 0.0001 ;

case { "cmatrix" "omatrix" "smatrix" "esmatrix" "ndmatrix" "wmatrix" }
tick = 0.25 ;

case { "fcmatrix" "lcmatrix" "lhmatrix" "pbmatrix" }
tick = 0.025 ;

case { "edmatrix" }
tick = 0.0025 ;

case { "simatrix" }
tick = 0.5 ;

case { "hgmatrix" "kcmatrix" "ojmatrix" "pamatrix" }
tick = 0.05 ;

case { "tymatrix" "usmatrix" }
tick = 0.015625 ;

endswitch

%  get data
open = data( : , 4 ) ;
high = data( : , 5 ) ;
low = data( : , 6 ) ;
close = data( : , 7 ) ;

%  clear data

%  create log series for crbm training
vwap = vwap_all_period_detrend( open , high , low , close , tick ) ;
open_from_vwap = log( open ./ vwap ) ; open_from_vwap(1) = 0 ;
open_from_previous_close = log( open ./ shift( close , 1 ) ) ; open_from_previous_close(1) = 0 ;
high_from_vwap = log( high ./ vwap ) ; high_from_vwap(1) = 0 ;
low_from_vwap = log( low ./ vwap ) ; low_from_vwap(1) = 0 ;
close_from_vwap = log( close ./ vwap ) ; close_from_vwap(1) = 0 ;

%  assemble into a features matrix
features = [ open_from_vwap open_from_previous_close high_from_vwap low_from_vwap close_from_vwap ] ;

training_point_index = input( 'The training point index for NN training from figure 1: ' ) ;

%  load the weights etc.
load layer1.mat ;
load layer1_2.mat ;
load layer1_3.mat ;
load layer2.mat ;
load layer2_2.mat ;
load layer2_3.mat ;
load bar_1_mean_std ;
load bar_2_mean_std ;
load bar_3_mean_std ;

%  how-many timesteps do we look back for directed connections - this is what we call the "order" of the model 
n1 = 3 ; % first layer
n2 = 3 ; % second layer

%  Set network properties
numhid1 = 100 ; numhid2 = 100 ; % numepochs = 2000 ;
gsd = 1 ;          % fixed standard deviation for Gaussian units

%  the first bar
init_data = features( training_point_index - 5 : training_point_index , : ) ;

%  which must be normalised using the same data_mean and data_std used to normalise the training data
init_data = ( init_data - repmat( data_mean_1 , size( init_data , 1 ) , 1 ) ) ./ repmat( data_std_1 , size( init_data , 1 ) , 1 ) ; 

%  and we also need the 3 fixed hidden units of the crbm's first hidden layer
[ hidden_1 ] = get_fixed_hidden_layer( init_data , numhid1 , w11 , bj11 , B11 , n1 , n2 , gsd ) ;

%  do Gibbs sampling and generate the visible layer from the above, clamped, init_data and hidden_1 layers
[ visible_data ] = generate_visible_data( init_data , hidden_1 , numhid1 , numhid2 , w11 , w21 , bj11 , bi11 , bj21 , bi21 , A11 , A21 , B11 , B21 , n1 , n2 , gsd ) ;

%  now post process the visible_data - scale back to original using data_mean & data_std from above
visible_data_1 = data_std_1 .* visible_data .+ data_mean_1 ;

%  repeat for 2nd bar
init_data = [ features( training_point_index - 4 : training_point_index , : ) ; visible_data_1 ] ;

%  which must be normalised using the same data_mean and data_std used to normalise the training data
init_data = ( init_data - repmat( data_mean_2 , size( init_data , 1 ) , 1 ) ) ./ repmat( data_std_2 , size( init_data , 1 ) , 1 ) ; 

%  and we also need the 3 fixed hidden units of the crbm's first hidden layer
[ hidden_1 ] = get_fixed_hidden_layer( init_data , numhid1 , w12 , bj12 , B12 , n1 , n2 , gsd ) ;

%  do Gibbs sampling and generate the visible layer from the above, clamped, init_data and hidden_1 layers
[ visible_data ] = generate_visible_data( init_data , hidden_1 , numhid1 , numhid2 , w12 , w22 , bj12 , bi12 , bj22 , bi22 , A12 , A22 , B12 , B22 , n1 , n2 , gsd ) ;

%  now post process the visible_data - scale back to original using data_mean & data_std from above
visible_data_2 = data_std_2 .* visible_data .+ data_mean_2 ;

%  repeat for 3rd bar
init_data = [ features( training_point_index - 3 : training_point_index , : ) ; visible_data_1 ; visible_data_2 ] ;

%  which must be normalised using the same data_mean and data_std used to normalise the training data
init_data = ( init_data - repmat( data_mean_3 , size( init_data , 1 ) , 1 ) ) ./ repmat( data_std_3 , size( init_data , 1 ) , 1 ) ; 

%  and we also need the 3 fixed hidden units of the crbm's first hidden layer
[ hidden_1 ] = get_fixed_hidden_layer( init_data , numhid1 , w13 , bj13 , B13 , n1 , n2 , gsd ) ;

%  do Gibbs sampling and generate the visible layer from the above, clamped, init_data and hidden_1 layers
[ visible_data ] = generate_visible_data( init_data , hidden_1 , numhid1 , numhid2 , w13 , w23 , bj13 , bi13 , bj23 , bi23 , A13 , A23 , B13 , B23 , n1 , n2 , gsd ) ;

%  now post process the visible_data - scale back to original using data_mean & data_std from above
visible_data_3 = data_std_3 .* visible_data .+ data_mean_3 ;

%  create new bars for plotting
new_opens = zeros( 3 , 1 ) ; new_highs = zeros( 3 , 1 ) ; new_lows = zeros( 3 , 1 ) ; new_closes = zeros( 3 , 1 ) ;

visible_data_1 = exp( visible_data_1 ) ; visible_data_2 = exp( visible_data_2 ) ; visible_data_3 = exp( visible_data_3 ) ;

new_opens( 1 , 1 ) = ( visible_data_1( 1 , 1 ) * vwap(training_point_index) + visible_data_1( 1 , 2 ) * close(training_point_index) ) / 2 ;
new_highs( 1 , 1 ) = visible_data_1( 1 , 3 ) * vwap(training_point_index) ;
new_lows( 1 , 1 ) = visible_data_1( 1 , 4 ) * vwap(training_point_index) ;
new_closes( 1 , 1 ) = visible_data_1( 1 , 5 ) * vwap(training_point_index) ;
vwap_1 = ( new_opens(1,1) + new_closes(1,1) + ( new_highs(1,1) + new_lows(1,1) )/2 ) / 3 ;

new_opens( 2 , 1 ) = ( visible_data_2( 1 , 1 ) * vwap_1 + visible_data_2( 1 , 2 ) * new_closes(1,1) ) / 2 ;
new_highs( 2 , 1 ) = visible_data_2( 1 , 3 ) * vwap_1 ;
new_lows( 2 , 1 ) = visible_data_2( 1 , 4 ) * vwap_1 ;
new_closes( 2 , 1 ) = visible_data_2( 1 , 5 ) * vwap_1 ;
vwap_2 = ( new_opens(2,1) + new_closes(2,1) + ( new_highs(2,1) + new_lows(2,1) )/2 ) / 3 ;

new_opens( 3 , 1 ) = ( visible_data_3( 1 , 1 ) * vwap_2 + visible_data_3( 1 , 2 ) * new_closes(2,1) ) / 2 ;
new_highs( 3 , 1 ) = visible_data_3( 1 , 3 ) * vwap_2 ;
new_lows( 3 , 1 ) = visible_data_3( 1 , 4 ) * vwap_2 ;
new_closes( 3 , 1 ) = visible_data_3( 1 , 5 ) * vwap_2 ;

figure(1) ;
plot_opens = open( training_point_index-20:training_point_index+3 , : ) ; plot_highs = high( training_point_index-20:training_point_index+3 , : ) ; plot_lows = low( training_point_index-20:training_point_index+3 , : ) ; plot_closes = close( training_point_index-20:training_point_index+3 , : ) ;
pkg load financial ;
clf() ;
highlow( plot_highs , plot_lows , plot_closes , plot_opens )

figure(2) ;
plot_opens = [ open( training_point_index-20:training_point_index , : ) ; new_opens ] ; plot_highs = [ high( training_point_index-20:training_point_index , : ) ; new_highs ] ; plot_lows = [ low( training_point_index-20:training_point_index , : ) ; new_lows ] ; plot_closes = [ close( training_point_index-20:training_point_index , : ) ; new_closes ] ; ;
clf() ;
highlow( plot_highs , plot_lows , plot_closes , plot_opens )
which produces this as output
and
where the last 3 bars on the right of the second chart are the modelled bars of the 3 rightmost bars in the upper chart. At the moment the modelling isn't very good because I'm using simplistic features - this is just proof of concept coding. Future work to do will include finding much better input features, as well as speeding up the running time of the code by compiling into .oct files, and then optimising parameters using my particle swarm optimisation code.

Sunday, 1 March 2015

Particle Swarm Optimisation, Part 2.

Following on from my last post, here is an Octave .oct file implementation of the one dimensional Particle swarm optimisation routine, with one slight twist: instead of using a for loop I've implemented it within a while loop with a stopping condition that the algorithm should cease once there has been no improvement in the global_best value for 25 iterations.
DEFUN_DLD ( pso_conversion_code, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} pso_conversion_code (@var{target_val,max_lambda})\n\
The output of this test function should be half the input target_val.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the number of input arguments
if ( nargin != 2 )
{
    error ( "Invalid number of arguments." ) ;
    return retval_list ;
}

if ( args(0).length () != 1 )
{
error ( "Invalid target_val. Should be a single value." ) ;
return retval_list ;
}

if ( args(1).length () != 1 )
{
error ( "Invalid max_lambda value. Should be a single value for the maximum 'guess'." ) ;
return retval_list ;
}

if (error_state)
{
    error ( "Invalid arguments." ) ;
    return retval_list ;
}
// end of input checking

double target_val = args(0).double_value() ;
double max_lambda = args(1).double_value() ;

double loocv_value ;

// the the pso algorithm
int no_iterations_until_cease = 25 ;
int no_of_particles = 100 ;

double global_best = std::numeric_limits::infinity() ;
double global_best_lambda = 0.0 ; // initially set to unregularised

ColumnVector local_best( no_of_particles , 1 ) ; local_best.fill( global_best ) ;
ColumnVector local( no_of_particles , 1 ) ;
ColumnVector local_best_so_far( no_of_particles , 1 ) ;

ColumnVector velocity( no_of_particles , 1 ) ; velocity.fill( 0.0 ) ; // particle velocity vector

// A Mersenne Twister random number generator can be declared with a simple from the #include "MersenneTwister.h" 
MTRand mtrand1 ;

// values for the random updating process
double r1 ; 
double r2 ;

// an inertial constant. Good values are usually slightly less than 1. Or it could be randomly initialized for each particle.
double w_ic = 0.9 ;

// c1 and c2 are constants that say how much the particle is directed towards good positions. They represent a "cognitive" and a "social" component, respectively, 
// in that they affect how much the particle's personal best and the global best (respectively) influence its movement. Usually we take c_1, c_2  approx = 2. 
// Or they could be randomly initialized for each particle.
double c1 = 2.0 ;
double c2 = 2.0 ; 

// fill the local vector with initial random values < max_lambda, temporarily using r1
for ( octave_idx_type ii (0) ; ii < no_of_particles ; ii++ )
    {
    r1 = mtrand1.randDblExc () ;
    local(ii) = r1 * max_lambda ;
    }

int while_counter = 0 ;
while ( while_counter < no_iterations_until_cease )
{
  
  // loop once over local_best and local vectors
  for ( octave_idx_type jj (0) ; jj < no_of_particles ; jj++ )
      {
 
          // Replace this code box as necessary 
 
          //*************************************************************//
   //                                                             // 
          // fitness function evaluation                                 //
   loocv_value = local(jj) * local(jj) - target_val * local(jj) ; //
   //                                                             //
   //*************************************************************//

   // check if the local_best has improved
   if ( loocv_value < local_best(jj) )
      {
      // update local_best and local_best_so_far vector if it has
      local_best(jj) = loocv_value ;
      local_best_so_far(jj) = local(jj) ;
      }
     
   // check if the above local_best has also improved the global_best  
   if ( local_best(jj) < global_best )
      {
      // update global_best and global_best_lambda if it has
      global_best = local_best(jj) ;
      global_best_lambda = local_best_so_far(jj) ;
      while_counter = 0 ;
      }
    
      } // end of loop once over local_best and local vectors
      
  // now update the particle velocity and position vectors
  for ( octave_idx_type jj (0) ; jj < no_of_particles ; jj++ )
      {
      r1 = mtrand1.randDblExc () ;
      r2 = mtrand1.randDblExc () ;
      velocity(jj) = w_ic * velocity(jj) + c1 * r1 * ( local_best_so_far(jj) - local(jj) ) + c2 * r2 * ( global_best - local(jj) ) ;
      local(jj) = local(jj) + velocity(jj) ;
      } // end of particle velocity and position vectors updates loop
 
while_counter += 1 ;
 
} // end of main while loop 

retval_list(1) = global_best ;
retval_list(0) = global_best_lambda ;

return retval_list ;
  
} // end of function
In the "commented" function section there is the same test function as in the vectorised code in my previous post. In real life application, of course, this code would be replaced - the above is just a test of my conversion of the pso algorithm from Octave to C++ code.

I've been looking at pso because I think I can easily use it as a simple Hyperparameter optimisation tool to tune the regularisation of the weights in my proposed neural net trading system. The reason I chose the fitness function I did in the above code is simply that it has a global minimum, which is what neural net training is all about - minimisation of an error function.