Tuesday 20 October 2020

A Temporal Clustering Function

Recently a reader contacted me with a view to collaborating on some work regarding the Delta phenomenon but after a brief exchange of e-mails this seems to have petered out. However, for my part, the work I have done has opened a few new avenues of investigation and this post is about one of them.

One of the problems I set out to solve was clustering in the time domain, or temporal clustering as I call it. Take a time series and record the time of occurance of an event by setting to 1, in an otherwise zero filled 1-dimensional vector the same length as the original time series, the value of the vector at time index tx and repeat for all occurances of the event. In my case the event(s) I am interested in are local highs and lows in the time series. This vector is then "chopped" into segments representing distinct periods of time, e.g. 1 day, 1 week etc. and stacked into a matrix where each row is one complete period and the columns represent the same time in each period, e.g. the first column is the first hour of the trading week, the second column the second hour etc. Sum down the columns to get a final 1-dimensional vector of counts of the timing of events happening within each period over the entire time series data record. A chart of such is shown below.

The coloured vertical lines show the opening and closing times of the London and New York sessions (7am to 5pm in their respective local times) for one complete week at a 10 minute bar time scale, in this case for the GBP_USD forex pair. This is what I want to cluster.

The solution I have come up with is the Octave function in the code box below

## Copyright (C) 2020 dekalog
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## GNU General Public License for more details.
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see
## .

## -*- texinfo -*- 
## @deftypefn {} {@var{centre_ix} =} blurred_maxshift_1d_linear (@var{train_vec}, @var{bandwidth})
## Clusters the 1 dimensional vector TRAIN_VEC using a "centred" sliding window of length  2 * BANDWIDTH + 1.
## Based on the idea of the Blurred Meanshift Algorithm.
## The "centre ix" value of the odd length sliding window is assigned to the
## maximum value ix of the sliding window. The centre_ix, if it is not the 
## maximum value, is then set to zero. A pass through the whole length of
## TRAIN_VEC is completed before any assignments are made.
## @seealso{}
## @end deftypefn

## Author: dekalog 
## Created: 2020-10-17

function new_train_vec = blurred_maxshift_1d_linear ( train_vec , bandwidth )

if ( nargin < 2 )
 bandwidth = 1 ;

if ( numel( train_vec ) < 2 * bandwidth + 1 )
 error( 'Bandwidth too wide for length of train_vec.' ) ;

length_train_vec = numel( train_vec ) ;
assigned_cluster_centre_ix = zeros( size( train_vec ) ) ;

## initialise the while condition variable
has_converged = 0 ;

while ( has_converged < 1 )
new_train_vec = zeros( size( train_vec ) ) ;

## do the beginning and end of train_vec first
[ ~ , ix ] = max( train_vec( 1 : 2 * bandwidth + 1 ) ) ;
new_train_vec( ix ) = sum( train_vec( 1 : bandwidth ) ) ;

[ ~ , ix ] = max( train_vec( end - 2 * bandwidth : end ) ) ;
new_train_vec( end - 2 * bandwidth - 1 + ix ) = sum( train_vec( end - bandwidth + 1 : end ) ) ;

for ii = 2 * bandwidth + 1 : numel( train_vec ) - bandwidth
 [ ~ , ix ] = max( train_vec( ii - bandwidth : ii + bandwidth ) ) ;
 new_train_vec( ii - bandwidth  - 1 + ix ) += train_vec( ii ) ; 

if ( sum( ( train_vec == new_train_vec ) ) == length_train_vec )
 has_converged = 1 ;
 train_vec = new_train_vec ;



I have named the function "blurred_maxshift_1d_linear" as it is inspired by the "blurred" version of the Mean shift algorithm, operates on a 1-dimensional vector and is "linear" in that there is no periodic wrapping of the data within the function code. The two function inputs are the above type of data, obviously, and an integer parameter "bandwidth" which controls the size of a moving window over the data in which the data is shifted according to a maximum value, hence maxshift rather than meanshift. I won't discuss the code further as it is pretty straightforward.

A chart of a typical clustering solution is (bandwidth setting == 2)

where the black line is the original count data and red the clustering solution. The bandwidth setting in this case is approximately equivalent to clustering with a 50 minute moving window. 

The following heatmap chart is a stacked version of the above where the bandwidth parameter is varied from 1 to 10 inclusive upwards, with the original data being at the lowest level per pane.

The intensity reflects the counts at each time tx index per bandwidth setting. The difference between the panes is that in the upper pane the raw data is the function input per bandwidth setting, whilst the lower pane shows hierarchical clustering whereby the output of the function is used as the input to the next function call with the next higher bandwidth parameter setting.

More in due course.