## Friday, 10 July 2020

### Time Warp Edit Distance as a Loss Function

Some time ago I posted about the time warp edit distance (twed) and recently I've revisited this as a possible loss function. The basic idea is to use this during algorithm training to match a "perfect equity curve" rather than use the usual loss measurements such as mean squared error for regression or cross entropy for classification.

As a proof of concept I've been playing around with this Octave script
``````clear all ;
1 ;

function J = twed_loss( x )
global price ;
global returns ;
global perfect_equity_curve ;
fast_ma = sma( price , round( x( 1 ) ) ) ;
slow_ma = sma( price , round( x( 2 ) ) ) ;
test_position_vector = sign( fast_ma .- slow_ma ) ;
test_position_vector = shift( test_position_vector , 2 ) ;
test_position_vector( 1 : 3 ) = 0 ;
test_equity_curve = cumsum( returns .* test_position_vector ) ;
x_axis = ( 1 : numel( perfect_equity_curve ) )' ;
J = twed( perfect_equity_curve , x_axis , test_equity_curve , x_axis , 1 , 0.001 ) ;
endfunction

## create a perfectly predictable price
global price = sinewave( 200 , 20 )' .+ 5 ;
global returns = [ 0 ; diff( price ) ] ;
perfect_position_vector = sign( returns ) ;
perfect_position_vector = shift( perfect_position_vector , 2 ) ;
perfect_position_vector( 1 : 3 ) = 0 ;
global perfect_equity_curve = cumsum( returns .* perfect_position_vector ) ;

## now do the Baysian training
## set up the parameters for bayesopt
params.n_iterations = 300 ; ## 190
params.n_init_samples = 10 ; ## 10
params.n_iter_relearn = 1 ; ## Number of iterations between re-learning kernel parameters. That is, kernel learning ocur 1 out of n_iter_relearn iterations.
## Ideally, the best precision is obtained when the kernel parameters are learned every iteration (n_iter_relearn=1).
## However, this learning part is computationally expensive and implies a higher cost per iteration. If n_iter_relearn=0, then there is no relearning. [Default 50]
params.crit_name = 'cEI' ;
params.surr_name = 'sStudentTProcessNIG' ;
params.noise = 1e-6 ;
params.kernel_name = 'kMaternARD5' ;
params.kernel_hp_mean = [ 1 ] ;
params.kernel_hp_std = [ 10 ] ;
params.verbose_level = 0 ; ## Negative -> Error -> stdout 0 -> Warning -> stdout 1 -> Info -> stdout 2 -> Debug -> stdout
## 3 -> Warning -> log file 4 -> Info -> log file 5 -> Debug -> log file 5 -> Error -> log file
params.load_save_flag = 0 ; ## 1-Load data, 2-Save data, 3-Load and append data. Other values, no file saving or restore [Default 0]
params.log_filename = '/home/dekalog/Documents/octave/twed/bayeopt.log' ; % Name/path of the log file
## (if applicable, verbose_level>=3) [Default "bayesopt.log"]
params.save_filename = '/home/dekalog/Documents/octave/twed/bayeopt.log' ;

lb = [ 2 3 ] ;
## upper bounds
ub = [ 30 30 ] ;
nDimensions = length( lb ) ;
[ xmin , fmin ] = bayesoptcont( 'twed_loss' , nDimensions , params , lb , ub ) ;
round( xmin )
fmin

fast_ma = sma( price , round( xmin(1) ) ) ;
slow_ma = sma( price , round( xmin(2) ) ) ;
figure(1) ; plot(price,'k','linewidth',2,fast_ma,'r','linewidth',2,slow_ma,'b','linewidth',2 ) ;
title( 'PRICE AND MA CROSSOVER SIGNALS' ) ; legend( 'PRICE' , 'FAST MA' , 'SLOW MA' ) ;
test_position_vector = sign( fast_ma .- slow_ma ) ;
test_position_vector = shift( test_position_vector , 2 ) ;
test_position_vector( 1 : 3 ) = 0 ;
test_equity_curve = cumsum( returns .* test_position_vector ) ;
figure(2) ; plot(perfect_equity_curve,'b','linewidth',2,test_equity_curve,'r','linewidth',2) ;
title( 'EQUITY CURVES' ) ; legend( 'PERFECT EQUITY CURVE' , 'TEST EQUITY CURVE' ) ;``````
which produces plots such as thisand this,which both show a fast and slow moving average crossover system on sine wave "price" of period 20, optimised to match equity curves such as below via the twed loss.
What is interesting about this is that the moving average lengths usually converge to more or less the expected theoretical optimum, with the required change of sign of signal, where the crossovers indicate peaks and troughs in price and hence perfect entry and exit signals.

However, sometimes the solution looks like this,which is an 11 period fast moving average and a 10 period slow one, quite a contrarian solution compared to the theoretical optimum, but actually giving a lower twed loss.

I quite like the idea of optimising for what we actually care about, i.e. the equity curve, whilst at the same time possibly uncovering unique solutions. It seems that the twed loss shows promise.

More in due course.