Tuesday, 3 January 2012

Creating Synthetic Data Using the Fast Fourier Transform

This question was recently asked on the Quantitative Finance forum and one answerer's link to here piqued my interest. My comments to said answer were
"Interesting link! I theorise that one way it could be done is to apply a Fast Fourier Transform (e.g. FFT function in Octave/MATLAB) to the original series, then divide the resulting vector into segments and within each separate segment randomly permute the values, and finally reconstruct the time series by applying the Inverse Fast Fourier Transform (IFFT function). By randomly permuting the values you will not alter the amplitudes of the composite sine waves but will slightly alter their periodicity - this will naturally tend to restrict the range of the new series to that of the original."
"This is such an intriguing approach that I may "knock up" some Octave code on my blog and post a link to it from here."
This blog post is my attempt to "knock up" the Octave code.

The "guts" of the code are a .oct function that randomly permutes the FFT output vector that is given to it as its only input argument and the C++ code for this .oct function is given below.
Note: due to formatting issues the headers are not correctly displayed; the octave headers should be enclosed between a "<" and ">" after the #include.
#include octave/oct.h
#include octave/dColVector.h
#include octave/CNDArray.h
#include "MersenneTwister.h"
DEFUN_DLD (permute_vector, args, , "Input is a vector that is to be permuted once")
octave_value_list retval;

int nargin = args.length () ;
int vec_length = args(0).length () ;

// check the input arguments
    if ( nargin > 1 )
        error ("Invalid arguments. Input is a vector that is to be permuted once") ;
        return retval ;
    if (vec_length < 2 )
        error ("Invalid arguments. Input is a vector that is to be permuted once") ;
        return retval ;
    if (error_state)
        error ("Invalid arguments. Input is a vector that is to be permuted once") ;
        return retval ;
// end of input checking

 ComplexNDArray input_vector = args(0).complex_array_value () ;
 ComplexNDArray output_vector = args(0).complex_array_value () ;
 int k1;
 int k2;
 MTRand mtrand1; // Declare the Mersenne Twister Class - will seed from system time

       k1 = vec_length - 1; // initialise prior to shuffling the vector

       while (k1 > 0) // While at least 2 left to shuffle
             k2 = mtrand1.randInt( k1 ); // Pick an int from 0 through k1 

               if (k2 > k1) // check if random vector index no. k2 is > than max vector index - should never happen 
                  k2 = k1 - 1; // But this is cheap insurance against disaster if it does happen

             output_vector(k1) = input_vector(k2) ; // allocate random pick k2 from input_vector to the k1 vector index of output_vector
             input_vector(k2) = input_vector(k1) ; // replace random pick k2 content of input_vector with content k1 of input_vector
             k1 = k1 - 1; // count down 
             } // Shuffling is complete when this while loop exits

 retval(0) = output_vector ; 
return retval; // Return the output to Octave
The Octave script that calls the above function is
clear all

contract = input( "Enter contract symbol e.g. sp: ","s") ;
data = load("-ascii",contract) ;
n = rows(data)
index_begin = input( "Enter index_begin: ") ;
index_end = input( "Enter index_end, value not greater than n: ") ;

close = data(index_begin:index_end,7) ;

% detrend the close vector prior to applying the fft
slope = ( close(end) - close(1) ) / ( length(close) - 1 ) ;
v1 = (0:1:length(close)-1)' ;
detrended_close = close .- ( v1 .* slope ) ;
close_index_begin = close(1)
detrended_close_index_begin = detrended_close(1)
detrended_close_index_end = detrended_close(end)

% create zero padded vector for fft
L2 = 2^nextpow2( length(close) ) ; half_L2 = L2/2 ;
y2 = zeros( 1,L2 ) ; y2( 1:length(close) ) = detrended_close ;

% apply the fft
transform = fft( y2 ) ;

% permute the first half of the transform vector in "chunks" of 10
max_ii_value = floor( half_L2 / 10 ) ;
for ii = 1:max_ii_value
transform( (ii*10):(ii*10)+9 ) = permute_vector( transform( (ii*10):(ii*10)+9 ) ) ;

% take the inverse fft
ifft_vec = real( ifft( transform ) ) ;

% retrend the ifft_permuted_vec
retrended_ifft_vec = ifft_vec( 1:length(close) )' .+ ( v1 .* slope ) ;

% statistics
correl = corrcoef (close, retrended_ifft_vec)
spear = spearman (close, retrended_ifft_vec)
tau = kendall (close, retrended_ifft_vec)

plot( close,'b',retrended_ifft_vec,'r' ) ; legend( 'close','permute' ) ; 
This script could undoubtedly be made more efficient by putting the for loop on lines 28 to 30 within the .oct function itself, but this is just a proof of concept. The script allows one to extract any contiguous section of data by use of the index_begin and index_end inputs or, of course, to select the whole data range. Some simple statistics are calculated using in-built Octave functions and can be seen in the terminal windows in the screen shots. Finally the script plots the original input data along with the FFT permuted output data. Typical screen shots are shown below, with the original data in blue and the FFT permuted data in red.

All S&P data

The last few years of the above

The same last few years indexed using the index_begin and index_end method to apply the function just to these years and not the whole data set
The close_index_begin, detrended_close_index_begin and detrended_close_index_end values also seen in the terminal windows are a simple sanity check to see that the original data has been detrended correctly prior to performing the FFT.

1 comment:

Anonymous said...

Hi, Dekalog:

I came to this blog via quant.stackexchange.com. I do not think the frequency randomization as described here accomplishes what you desire. Indeed, the expected covariance of the original time series and the frequency permuted time series is zero.