Sunday, 1 March 2015

Particle Swarm Optimisation, Part 2.

Following on from my last post, here is an Octave .oct file implementation of the one dimensional Particle swarm optimisation routine, with one slight twist: instead of using a for loop I've implemented it within a while loop with a stopping condition that the algorithm should cease once there has been no improvement in the global_best value for 25 iterations.
DEFUN_DLD ( pso_conversion_code, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} pso_conversion_code (@var{target_val,max_lambda})\n\
The output of this test function should be half the input target_val.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the number of input arguments
if ( nargin != 2 )
{
error ( "Invalid number of arguments." ) ;
return retval_list ;
}

if ( args(0).length () != 1 )
{
error ( "Invalid target_val. Should be a single value." ) ;
return retval_list ;
}

if ( args(1).length () != 1 )
{
error ( "Invalid max_lambda value. Should be a single value for the maximum 'guess'." ) ;
return retval_list ;
}

if (error_state)
{
error ( "Invalid arguments." ) ;
return retval_list ;
}
// end of input checking

double target_val = args(0).double_value() ;
double max_lambda = args(1).double_value() ;

double loocv_value ;

// the the pso algorithm
int no_iterations_until_cease = 25 ;
int no_of_particles = 100 ;

double global_best = std::numeric_limits::infinity() ;
double global_best_lambda = 0.0 ; // initially set to unregularised

ColumnVector local_best( no_of_particles , 1 ) ; local_best.fill( global_best ) ;
ColumnVector local( no_of_particles , 1 ) ;
ColumnVector local_best_so_far( no_of_particles , 1 ) ;

ColumnVector velocity( no_of_particles , 1 ) ; velocity.fill( 0.0 ) ; // particle velocity vector

// A Mersenne Twister random number generator can be declared with a simple from the #include "MersenneTwister.h"
MTRand mtrand1 ;

// values for the random updating process
double r1 ;
double r2 ;

// an inertial constant. Good values are usually slightly less than 1. Or it could be randomly initialized for each particle.
double w_ic = 0.9 ;

// c1 and c2 are constants that say how much the particle is directed towards good positions. They represent a "cognitive" and a "social" component, respectively,
// in that they affect how much the particle's personal best and the global best (respectively) influence its movement. Usually we take c_1, c_2  approx = 2.
// Or they could be randomly initialized for each particle.
double c1 = 2.0 ;
double c2 = 2.0 ;

// fill the local vector with initial random values &lt; max_lambda, temporarily using r1
for ( octave_idx_type ii (0) ; ii &lt; no_of_particles ; ii++ )
{
r1 = mtrand1.randDblExc () ;
local(ii) = r1 * max_lambda ;
}

int while_counter = 0 ;
while ( while_counter &lt; no_iterations_until_cease )
{

// loop once over local_best and local vectors
for ( octave_idx_type jj (0) ; jj &lt; no_of_particles ; jj++ )
{

// Replace this code box as necessary

//*************************************************************//
//                                                             //
// fitness function evaluation                                 //
loocv_value = local(jj) * local(jj) - target_val * local(jj) ; //
//                                                             //
//*************************************************************//

// check if the local_best has improved
if ( loocv_value &lt; local_best(jj) )
{
// update local_best and local_best_so_far vector if it has
local_best(jj) = loocv_value ;
local_best_so_far(jj) = local(jj) ;
}

// check if the above local_best has also improved the global_best
if ( local_best(jj) &lt; global_best )
{
// update global_best and global_best_lambda if it has
global_best = local_best(jj) ;
global_best_lambda = local_best_so_far(jj) ;
while_counter = 0 ;
}

} // end of loop once over local_best and local vectors

// now update the particle velocity and position vectors
for ( octave_idx_type jj (0) ; jj &lt; no_of_particles ; jj++ )
{
r1 = mtrand1.randDblExc () ;
r2 = mtrand1.randDblExc () ;
velocity(jj) = w_ic * velocity(jj) + c1 * r1 * ( local_best_so_far(jj) - local(jj) ) + c2 * r2 * ( global_best - local(jj) ) ;
local(jj) = local(jj) + velocity(jj) ;
} // end of particle velocity and position vectors updates loop

while_counter += 1 ;

} // end of main while loop

retval_list(1) = global_best ;
retval_list(0) = global_best_lambda ;

return retval_list ;

} // end of function
In the "commented" function section there is the same test function as in the vectorised code in my previous post. In real life application, of course, this code would be replaced - the above is just a test of my conversion of the pso algorithm from Octave to C++ code.

I've been looking at pso because I think I can easily use it as a simple Hyperparameter optimisation tool to tune the regularisation of the weights in my proposed neural net trading system. The reason I chose the fitness function I did in the above code is simply that it has a global minimum, which is what neural net training is all about - minimisation of an error function.

Friday, 20 February 2015

Particle Swarm Optimisation

Having decided that I'm going to use my mfe_mae indicator as a target for my neural net, over the last couple of months I've been doing some research on what would make good features for this target. In the course of this I've also decided that Particle swarm optimization would be a useful tool to use.

Thanks to the pseudo-code on this wiki and the code on in this stackoverflow thread I've been able to write some Octave code to perform pso over one dimension, which is shown in the code box below:
clear all

function [out] = evaluate_fitness( R , val )
out = R .* R .- val .* R ;
end

val = input( 'Enter test val: ' ) ;

% Initialize the particle positions and their velocities
n_particles = 100 ;
upper_limit = val ;
n_iterations = 50 ;

w = 0.9 .* ones( n_particles , 1 ) ; % is an inertial constant. Good values are usually slightly less than 1. Or it could be randomly initialized for each particle.

% c1 and c2 are constants that say how much the particle is directed towards good positions. They represent a "cognitive" and a "social" component, respectively,
% in that they affect how much the particle's personal best and the global best (respectively) influence its movement. Usually we take c_1, c_2  approx = 2.
% Or they could be randomly initialized for each particle.
C1 = 2.0 ;
C2 = 2.0 ;

X = upper_limit * rand( n_particles , 1 ) ; % == particle position vector containing lambda values
V = zeros( size(X) ) ; % particle velocity vector
X_lbest = X ; % == particle position vector containing lambda values for local best

% Initialize the global and local fitness to the worst possible. Fitness == LOOCV "press" statistic
fitness_gbest = Inf ; % _gbest == global best
fitness_lbest = fitness_gbest * ones( n_particles , 1 ) ; % _lbest == local best

% Loop until convergence, in this example a finite number of iterations chosen
for ii = 1 : n_iterations

% evaluate the fitness of each particle, i.e. do the linear regression and get
% the LOOCV statistic
fitness_X = evaluate_fitness( X , val ) ;

% Update the local bests and their fitness
ix = find( fitness_X < fitness_lbest ) ; % if local LOOCV "press" statistic improves
fitness_lbest( ix ) = fitness_X( ix ) ;  % record this better LOOCV statistic value
X_lbest( ix ) = X( ix ) ;                % and the lambda value that gave rise to it

% Update the global best and its fitness
[ min_fitness min_fitness_index ] = min( fitness_X ) ;

if ( min_fitness < fitness_gbest )       % if global LOOCV "press" statistic improves
fitness_gbest = min_fitness ;        % record this better LOOCV statistic value
X_gbest = X( min_fitness_index ) ;   % and the lambda value that gave rise to it
end % end if

% Update the particle velocity and position vectors
R1 = rand( n_particles , 1 ) ;
R2 = rand( n_particles , 1 ) ;
V = w .* V + C1 .* R1 .* ( X_lbest .- X ) .+ C2 .* R2 .* ( X_gbest .* ones( n_particles , 1 ) .- X ) ;
X = X .+ V ;

end % end main ii loop
and which is vectorised as much as I can make it at the moment. The evaluate_fitness function I've chosen to use is a Quadratic function of the form
f(x) = x^2 - bx
which, when a positive value for "val" is input as the test value, ensures the function curve goes through the origin and has a minimum y-axis value at a point on the x-axis that is half the input test value. This make it easy to quickly verify that the pso code is performing as expected, with the global minimum x_axis value found by the algorithm being given by the variable X_gbest. My reasons for choosing a test function of this form, and for looking at pso in general, will be given in my next post.

Thursday, 11 December 2014

MFE/MAE Indicator Test Results

Following on from the previous post, the test I outlined in that post wasn't very satisfactory, which I put down to the fact that the Sigmoid transformation of the raw MFE/MAE indicator values is not amenable to the application of standard deviation as a meaningful measure. Instead, I changed the test to one based on the standard error of the mean, an example screen shot of which is shown below:-
The top pane shows the the long version of the indicator and the bottom pane the short version. In each there are upper and lower limits of the sample standard error of the mean above and below the population mean (mean of all values of the indicator) along with the cumulative mean value of the top N matches as shown on the x-axis. In this particular example it can be seen that around the 170-180 samples mark the cumulative mean moves inside the standard error limits, never to leave them again. The meaning I ascribe to this is that there is no value to be gained from using more than approximately 180 samples for machine learning purposes, for this example, as to use more samples would be akin to training on all available data, which makes the use of my Cauchy Schwarz matching algo superfluous. I repeated the above on all instances of sigmoid transformed and untransformed MFE/MAE indicator values to get an average of 325 samples for transformed, and an average of 446 samples for the untransformed indicator values across the 4 major forex pairs. Based on this, I have decided to use the top 450 Cauchy Schwarz matches for training purposes, which has ramifications for model complexity will be discussed shortly.

Returning to the above screen shot, the figure 2 inset shows the price bars that immediately follow the price bar for which the main screen shows the top N matches. Looking at the extreme left of the main screen it can be seen that the lower pane, short indicator has an almost maximum reading of 1 whilst the upper pane, long indicator shows a value of approx. 2.7, which is not much above the global minimum for this indicator and well below the 0.5 neutral level. This strongly suggests a short position, and looking at the inset figure it can be seen that over the 3 days following the extreme left matched bar a short position was indeed the best position to hold. This is a pattern that seems to frequently present itself during visual inspection of charts, although I am unable to quantify this in any way.

On the matter of model complexity alluded to above, I found the Learning From Data course I have recently completed on the edX platform to be very enlightening, particularly the concept of the VC dimension, which is nicely explained in the Learning From Data Video library. I'll leave it to interested readers to follow the links, but the big take away for me is that using 450 samples as described above implies that my final machine learning model must have an upper bound of approximately 45 on the VC dimension, which in turn implies a maximum of 45 weights in the neural net. This is a design constraint that I will discuss in a future post.

Wednesday, 26 November 2014

Test of the MFE/MAE Indicator

Continuing from my last post, wherein I stated I was going to conduct a more pertinent statistical test of the returns of the bars(s) immediately following the N best, Cauchy Schwarz matching algorithm matched bars in the price history, readers may recall that the basic premise behind this algorithm is that by matching current price action to the N best matches, the price action after these matches can be used to infer what will occur after the current price action. However, rather than test the price action directly I have decided to apply the test to the MFE/MAE indicator. There are several reasons for this, which are enumerated below.
1. I intend to use the indicator as the target function for future Neural net training
2. the indicator represents a reward to risk ratio, which indirectly reflects price action itself, but without the noise of said action
3. this reward to risk ratio is of much more direct concern, from a trading perspective, than accurately predicting price
4. since the indicator is now included as a feature in the matching algorithm, testing the indicator is, very indirectly, a test of the matching algorithm too
The test I have in mind is a hybrid of a hypothesis test, Cross validation and the application of Statistical process control.  Consider the chart below:
This shows two sampling distributions of the mean for Long MFE/MAE indicator values > 0.5, the upper pane for sample sizes of 20 and the lower pane for 75. For simplicity I shall only discuss the Long > 0.5 version of the indicator, but everything that follows applies equally to the Short version. As expected the upper pane shows greater variance, and for the envisioned test a whole series of these sampling distributions will be produced for different sampling rates. The way I intend it to work is as follows:
• take a single bar in the history and see what the value of the MFE/MAE indicator value is 3 bars later (assume > 0.5 for this exposition, so we compare to long sampling distributions only)
• get the top 20 matched bars for the above selected bar and the corresponding 20 indicator values for 3 bars later and take the mean of these 20 indicator values
• check if this mean falls within the sampling distribution of the mean of 20, as shown in the upper pane above by the vertical black line at 0.8 on the x axis. If it does fall with the sampling distribution, we accept the null hypothesis that the 20 best matches in history future indicator values and the value of the indicator after the bar to be matched come from the same distribution
• repeat the immediately preceding step for means of 21, 22, ... etc until such time as the null hypothesis can be rejected, shown in the lower pane above. At this point, we then then declare an upper bound on the historical number of matches for the bar to be predicted
For any single bar to be predicted we can then produce the following chart, which is completely artificial and just for illustrative purposes:
where the cyan and red lines are the +/- 2 standard deviations above/below a notional mean value for the whole distribution of approximately 0.85, and the chart can be considered to be a type of control chart. The upper and lower control lines converge towards the right, reflecting the decreasing variance of increasingly large N sample means, as shown in the first chart above. The green line represents the cumulative N sample mean of the best N historical matches' future values. I have shown it as decreasing as it is to be expected that as more N matches are included, the greater the chance that incorrect matches, unexpected price reversals etc. will be caught up in this mean calculation, resulting in the mean value moving into the left tail of the sampling distribution. This effect combines with the shrinking variance to reach a critical point (rejection of the null hypothesis) at which the green line exits below the lower control line.

The purpose of all the above is provide a principled manner to choose the number N matches from the Cauchy-Schwarz matching algorithm to supply instances of training data to the envisioned neural net training. An incidental benefit of this approach is that it is indirectly a hypothesis test of the fundamental assumption underlying the matching algorithm; namely that past price action has predictive ability for future price action, and furthermore, it is a test of the MFE/MAE indicator. Discussion of the results of these tests in a future post.

Wednesday, 12 November 2014

First Use for the MFE/MAE Indicator

This first use is as an input to my Cauchy-Schwarz matching algorithm, previous posts about which can be read herehere and here. The screen shot below shows what I would characterise as a "good" set of matches:
The top left pane shows the original section of the price series to be matched, and the panes labelled #1, #5, etc. are the best match, 5th best match and so on respectively. The last 3 rightmost bars in each pane are "future" price bars, i.e. the 4th bar in from the right is the target bar that is being matched, matched over all the bars to the left or in the past of this target bar.

I consider the above to be a set of "good" matches because, for the #1 through #25 matches for "future" bars:
• if one considers the logic of the mfe/mae indicator each pane gives indicator readings of "long," which all agree with the original "future" bars
• similarly the mae (maximum adverse excursion) occurs on the day immediately following the matched day
• the mfe (maximum favourable excursion) occurs on the 3rd "future" bar, with the slight exception of pane #10
• the marked to market returns of an entry at the open of the 1st "future" bar to the close of the 3rd "future" bar all show a profit, as does the original pane
However, it can be seen that the above noted "goodness" breaks down for panes #25 and #30, which leads me to postulate that there is an upper bound on the number of matches for which there is predictive ability for "future" returns.

In the above linked posts the test statistic used to judge the predictive efficacy of the matching algorithm was effect size. However, I think a more pertinent test statistic to use would be the average bar return over the bars immediately following a matched bar, and a discussion of this will be the subject of my next post.

Wednesday, 5 November 2014

A New MFE/MAE Indicator.

After stopping my investigation of tools for spectral analysis over the last few weeks I have been doing another mooc, this time Learning from Data, and also working on the idea of one of my earlier posts.

In the above linked post there is a video showing the idea as a "paint bar" study. However, I thought it would be a good idea to render it as an indicator, the C++ Octave .oct code for which is shown in the code box below.
DEFUN_DLD ( adjustable_mfe_mae_from_open_indicator, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} adjustable_mfe_mae_from_open_indicator (@var{open,high,low,close,lookback_length})\n\
This function takes four input series for the OHLC and a value for lookback length. The main outputs are\n\
two indicators, long and short, that show the ratio of the MFE over the MAE from the open of the specified\n\
lookback in the past. The indicators are normalised to the range 0 to 1 by a sigmoid function and a MFE/MAE\n\
ratio of 1:1 is shifted in the sigmoid function to give a 'neutral' indicator reading of 0.5. A third output\n\
is the max high - min low range over the lookback_length normalised by the range of the daily support and\n\
resistance levels S1 and R1 calculated for the first bar of the lookback period. This is also normalised to\n\
give a reading of 0.5 in the sigmoid function if the ratio is 1:1. The point of this third output is to give\n\
some relative scale to the unitless MFE/MAE ratio and to act as a measure of strength or importance of the\n\
MFE/MAE ratio.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin != 5 )
{
error ( "Invalid arguments. Arguments are price series for open, high, low and close and value for lookback length." ) ;
return retval_list ;
}

if ( args(4).length () != 1 )
{
error ( "Invalid argument. Argument 5 is a scalar value for the lookback length." ) ;
return retval_list ;
}

int lookback_length = args(4).int_value() ;

if ( args(0).length () &lt; lookback_length )
{
error ( "Invalid argument lengths. Argument lengths for open, high, low and close vectors should be >= lookback length." ) ;
return retval_list ;
}

if ( args(1).length () != args(0).length () )
{
error ( "Invalid argument lengths. Argument lengths for open, high, low and close vectors should be equal." ) ;
return retval_list ;
}

if ( args(2).length () != args(0).length () )
{
error ( "Invalid argument lengths. Argument lengths for open, high, low and close vectors should be equal." ) ;
return retval_list ;
}

if ( args(3).length () != args(0).length () )
{
error ( "Invalid argument lengths. Argument lengths for open, high, low and close vectors should be equal." ) ;
return retval_list ;
}

if (error_state)
{
error ( "Invalid arguments. Arguments are price series for open, high, low and close and value for lookback length." ) ;
return retval_list ;
}
// end of input checking

// inputs
ColumnVector open = args(0).column_vector_value () ;
ColumnVector high = args(1).column_vector_value () ;
ColumnVector low = args(2).column_vector_value () ;
ColumnVector close = args(3).column_vector_value () ;
// outputs
ColumnVector long_mfe_mae = args(0).column_vector_value () ;
ColumnVector short_mfe_mae = args(0).column_vector_value () ;
ColumnVector range = args(0).column_vector_value () ;

// variables
double max_high = *std::max_element( &high(0), &high( lookback_length ) ) ;
double min_low = *std::min_element( &low(0), &low( lookback_length ) ) ;
double pivot_point = ( high(0) + low(0) + close(0) ) / 3.0 ;
double s1 = 2.0 * pivot_point - high(0) ;
double r1 = 2.0 * pivot_point - low(0) ;

for ( octave_idx_type ii (0) ; ii &lt; lookback_length ; ii++ ) // initial ii loop
{

// long_mfe_mae
if ( open(0) > min_low ) // the "normal" situation
{
long_mfe_mae(ii) = 1.0 / ( 1.0 + exp( -( ( max_high - open(0) ) / ( open(0) - min_low ) - 1.0 ) ) ) ;
}
else if ( open(0) == min_low )
{
long_mfe_mae(ii) = 1.0 ;
}
else
{
long_mfe_mae(ii) = 0.5 ;
}

// short_mfe_mae
if ( open(0) &lt; max_high ) // the "normal" situation
{
short_mfe_mae(ii) = 1.0 / ( 1.0 + exp( -( ( open(0) - min_low ) / ( max_high - open(0) ) - 1.0 ) ) ) ;
}
else if ( open(0) == max_high )
{
short_mfe_mae(ii) = 1.0 ;
}
else
{
short_mfe_mae(ii) = 0.5 ;
}

range(ii) = 1.0 / ( 1.0 + exp( -( ( max_high - min_low ) / ( r1 - s1 ) - 1.0 ) ) ) ;

} // end of initial ii loop

for ( octave_idx_type ii ( lookback_length ) ; ii &lt; args(0).length() ; ii++ ) // main ii loop
{
// assign variable values
max_high = *std::max_element( &high( ii - lookback_length + 1 ), &high( ii + 1 ) ) ;
min_low = *std::min_element( &low( ii - lookback_length + 1 ), &low( ii + 1 ) ) ;
pivot_point = ( high(ii-lookback_length) + low(ii-lookback_length) + close(ii-lookback_length) ) / 3.0 ;
s1 = 2.0 * pivot_point - high(ii-lookback_length) ;
r1 = 2.0 * pivot_point - low(ii-lookback_length) ;

// long_mfe_mae
if ( open( ii - lookback_length + 1 ) > min_low && open( ii - lookback_length + 1 ) &lt; max_high ) // the "normal" situation
{
long_mfe_mae(ii) = 1.0 / ( 1.0 + exp( -( ( max_high - open( ii - lookback_length + 1 ) ) / ( open( ii - lookback_length + 1 ) - min_low ) - 1.0 ) ) ) ;
}
else if ( open( ii - lookback_length + 1 ) == min_low )
{
long_mfe_mae(ii) = 1.0 ;
}
else
{
long_mfe_mae(ii) = 0.0 ;
}

// short_mfe_mae
if ( open( ii - lookback_length + 1 ) > min_low && open( ii - lookback_length + 1 ) &lt; max_high ) // the "normal" situation
{
short_mfe_mae(ii) = 1.0 / ( 1.0 + exp( -( ( open( ii - lookback_length + 1 ) - min_low ) / ( max_high - open( ii - lookback_length + 1 ) ) - 1.0 ) ) ) ;
}
else if ( open( ii - lookback_length + 1 ) == max_high )
{
short_mfe_mae(ii) = 1.0 ;
}
else
{
short_mfe_mae(ii) = 0.0 ;
}

range(ii) = 1.0 / ( 1.0 + exp( -( ( max_high - min_low ) / ( r1 - s1 ) - 1.0 ) ) ) ;

} // end of main ii loop

retval_list(2) = range ;
retval_list(1) = short_mfe_mae ;
retval_list(0) = long_mfe_mae ;

return retval_list ;

} // end of function

The way to interpret this is as follows:
• if the "long" indicator reading is above 0.5, go long
• if the "short" is above 0.5, go short
• if both are below 0.5, go flat
An alternative, if the indicator reading is flat, is to maintain any previous non flat position. I won't show a chart of the indicator itself as it just looks like a very noisy oscillator, but the equity curve(s) of it, without the benefit of foresight, on the EURUSD forex pair are shown below.
The yellow equity curve is the cumulative, close to close, tick returns of a buy and hold strategy, the blue is the return going flat when indicated, and the red maintaining the previous position when flat is indicated. Not much to write home about. However, this second chart shows the return when one has the benefit of the "peek into the future" as discussed in my earlier post.
The colour of the curves are as before except for the addition of the green equity curve, which is the cumulative, vwap value to vwap value tick returns, a simple representation of what an equity curve with realistic slippage might look like. This second set of equity curves shows the promise of what could be achievable if a neural net to accurately predict future values of the above indicator can be trained. More in an upcoming post.

Tuesday, 23 September 2014

High Resolution Tools for Spectral Analysis - Update

Following on from my initial enthusiasm for the code on the High Resolution Tools for Spectral Analysis page, I have say that I have been unable to get the code performing as I would like it for my intended application to price time series.

My original intent was to use the zero crossing period estimation function, the subject of my last few posts, to get a rough idea of the dominant cycle period and then use the most recent data in a rolling window of this length as input to the high resolution code. This approach, however, ran into problems.

Firstly, windows of just the dominant cycle length (approximately 10 to 30 data points only) would lead to all sorts of errors being thrown from the toolkit functions as well as core Octave functions, such as divide by zero warnings and cryptic error messages that even now I don't understand. My best guess here is that the amount of data available in such short windows is simply insufficient for the algorithm to work, in much the same way as the Fast Fourier transform may fail to work if given too little data that is not a power of 2 in length. It might be possible to substantially rewrite the relevant functions, but my understanding of the algorithm and the inner workings of Octave means this is well beyond my pay grade.

My second approach was to simply extend the amount of data available by using the Octave repmat function on the windowed data so that all the above errors disappeared. This had very hit and miss results - sometimes they were very accurate, other times just fair to middling, and on occasion just way off target. I suspect here that the problem is the introduction of signal artifacts via the use of the repmat function, which results in Aliasing of the underlying signal.

As a result I shall not continue with investigating this toolbox for now. Although I only investigated the use of the me.m function (there are other functions available) I feel that my time at the moment can be used more productively.