Sunday 22 September 2013

Savitzky-Golay Filters in Action

As a first step in investigating SG filters I have simply plotted them on samples of my idealised time series data, three samples shown below:
where the underlying "price" is cyan, and the 3rd, 4th, 5th, 6th and 7th order SG fits are blue, red, magenta, green and yellow respectively. As can be seen, 5th order and above perfectly match the underlying price to the extent that the cyan line is completely obscured.

Repeating the above on real prices (ES-mini close) is shown in the video below:
Firstly, the thing that is immediately apparent is that real prices are more volatile than my idealised price series, a statement of the obvious perhaps, but it has led me to reconsider how I model my idealised price series. However, there are times when the patterns of the SG filters on real prices converge to almost identically resemble the patterns of SG filters on my idealised prices. This observation encourages me to continue to look at applying SG filters for my purposes. How I plan to use SG filters and modify my idealised price series will be the subject of my next post.

Sunday 15 September 2013

Savitzky-Golay Filter Convolution Coefficients

My investigation so far has had me reading General Least Squares Smoothing And Differentiation By The Convolution Method, a paper in the reference section of the Wikipedia page on Savitzky-Golay filters. In this paper there is Pascal code for calculating the convolution coefficients for a Savitzky-Golay filter, and in the code box below is my C++ translation of this Pascal code. Nb, due to formatting issues the symbol "<" appears as ampersand lt semi-colon in this code.
// Calculate Savitzky-Golay Convolution Coefficients
#include 
using namespace std;

double GenFact (int a, int b)
{ // calculates the generalised factorial (a)(a-1)...(a-b+1)

  double gf = 1.0 ;
  
  for ( int jj = (a-b+1) ; jj &lt; a+1 ; jj ++ ) 
  {
    gf = gf * jj ; 
  }  
    return (gf) ;
    
} // end of GenFact function

double GramPoly( int i , int m , int k , int s )
{ // Calculates the Gram Polynomial ( s = 0 ), or its s'th
  // derivative evaluated at i, order k, over 2m + 1 points
  
double gp_val ;  

if ( k > 0 )
{   
gp_val = (4.0*k-2.0)/(k*(2.0*m-k+1.0))*(i*GramPoly(i,m,k-1,s) + s*GramPoly(i,m,k-1.0,s-1.0)) - ((k-1.0)*(2.0*m+k))/(k*(2.0*m-k+1.0))*GramPoly(i,m,k-2.0,s) ;
}
else
{  
  if ( ( k == 0 ) && ( s == 0 ) )
  {
     gp_val = 1.0 ;
  }     
  else
  {
     gp_val = 0.0 ;
  } // end of if k = 0 & s = 0  
} // end of if k > 0

return ( gp_val ) ;

} // end of GramPoly function

double Weight( int i , int t , int m , int n , int s )
{ // calculates the weight of the i'th data point for the t'th Least-square
  // point of the s'th derivative, over 2m + 1 points, order n

double sum = 0.0 ;

for ( int k = 0 ; k &lt; n + 1 ; k++ )
{
sum += (2.0*k+1.0) * ( GenFact(2.0*m,k) / GenFact(2.0*m+k+1.0,k+1.0) ) * GramPoly(i,m,k,0) * GramPoly(t,m,k,s) ;    
} // end of for loop

return ( sum ) ;
                                               
} // end of Weight function

int main ()
{ // calculates the weight of the i'th data point (1st variable) for the t'th   Least-square point (2nd variable) of the s'th derivative (5th variable), over 2m + 1 points (3rd variable), order n (4th variable)
  
  double z ;
  
  z = Weight (4,4,4,2,0) ; // adjust input variables for required output
  
  cout &lt;&lt; "The result is " &lt;&lt; z ;
  
  return 0 ;
}
Whilst I don't claim to understand all the mathematics behind this, the use of the output of this code is conceptually not that much more difficult than a FIR filter, i.e. multiply each value in a look back window by some coefficient, except that a polynomial of a given order is being least squares fit to all points in the window rather than, for example, an average of all the points being calculated. The code gives the user the ability to calculate the value of the polynomial at each point in the window, or its derivative(s) at that point.

I would particularly draw readers' attention to the derivative(s) link above. On this page there is an animation of the slope (1st derivative) of a waveform. If this were to be applied to a price time series the possible applications become apparent: a slope of zero will pick out local highs and lows to identify local support and resistance levels; an oscillator zero crossing will give signals to go long or short; oscillator peaks will give signals to tighten stops or place profit taking orders. And why stop there? Why not use the second derivative as a measure of the acceleration of a price move? And how about the jerk and the jounce of a price series? I don't know about readers, but I have never seen a trading article, forum post or blog talk about the jerk and jounce of price series. Of course, if one were to plot all these as indicators I'm sure it would just be confusing and nigh impossible to come up with a cogent rule set to trade with. However, one doesn't have to do this oneself, and this is what I was alluding to in my earlier post when I said that the Savitzky-Golay framework might provide unique and informative inputs to my neural net trading system. Providing enough informative data is available for training, the rule set will be encapsulated within the weights of any trained neural net.

Tuesday 10 September 2013

Savitzky-Golay Filters

Recently I was contacted by a reader who asked me if I had ever looked into Savitzky-Golay filters and my reply was that I had and that, in my ignorance, I hadn't found them useful. However, thinking about it, I think that I should look into them again. The first time I looked was many years ago when I was an Octave coding novice and also my general knowledge about filters was somewhat limited.

As prelude to this new investigation I have below plotted two charts of real ES Mini closes in cyan, along with a Savitzky-Golay filter in red and the "Super Smoother" in green for comparative purposes.

The Savitzky-Golay filter is easily available in Octave by a simple call such as

filter = sgolayfilt( close , 3 , 11 ) ;

which is the actual call that the above are plots of, which is an 11 bar cubic polynomial fit. As can be seen, the Savitzky-Golay filter is generally as smooth as the super smoother but additionally seems to have less lag and doesn't overshoot at turning points like the super smoother. Given a choice between the two, I'd prefer to use the Savitzsky-Golay filter.

However, there is a serious caveat to the use of the S-G filter - it's a centred filter, which means that its calculation requires "future" values. This feature is well demonstrated by the animated figure at the top right of the above linked Wikipedia page. Although I can't accurately recall the reason(s) I abandoned the S-G filter all those years ago, I'm pretty sure this was an important factor. The Wikipedia page suggests how this might be overcome in its "Treatment of first and last points" section. Another way to approach this could be to adapt the methodology I employed in creating my perfect oscillator indicator, and this is in fact what I intend to do over the coming days and weeks. In addition to this I shall endeavour to make the length of the filter adaptive rather than having a fixed bar length polynomial calculation. 

And the reason for setting myself this task? Apart from perhaps creating an improved smoother following my recent disappointment with frequency domain smoothing, there is the possibility of creating useful inputs for my Neural Net trading algorithm: the framework of S-G filters allows for first, second, third and fourth derivatives to be easily calculated and I strongly suspect that, if my investigations and coding turn out to be successful, these will be unique and informative trading inputs.

Monday 9 September 2013

Update on FFT Smoother

Having tested the FFT smoother on real price data and having been disappointed, I have decided not to pursue this any further for the moment. I think the problem is that for such short input periods ( 10 to 50 bars ) the FFT does not have enough frequency resolution and that by eliminating most of the frequencies from this short period I'm throwing out the baby with the bath water.

Wednesday 4 September 2013

FFT Smoother Function Test

Following on from my last post I now want to show the FFT smoother function results on my usual idealised market data. The Octave script code for this test is given in the code box below.
clear all

% create the "price" vector
period = input( 'Enter period between 10 & 50 for underlying sine: ' ) ;
underlying_sine = sinewave( 500 , period ) ; % create underlying sinewave function with amplitude 2
trend_mult = input( 'Enter value for trend (a number between -4 & 4 incl.) to 4 decimal places: ' ) ;
trend = ( trend_mult / period ) .* (0:1:499) ;
noise_mult = input( 'Enter value for noise multiplier (a number between 0 & 1 incl.) to 2 decimal places: ' ) ; 
noise = noise_mult * randn(500,1)' ;
smooth_price = underlying_sine .+ trend ;
price = underlying_sine .+ trend .+ noise ;
super_smooth = super_smoother( price ) ;

smooth = price ;
mean_value = price ;
mean_smooth = price ;

for ii = period+1 : length( price )

mean_value(ii) = mean( price( ii-period : ii ) ) ;
fft_input = price( ii-(period-1) : ii ) .- mean_value(ii) ;
N = length( fft_input ) ;

data_in_freq_domain = fft( fft_input ) ;

if mod( N , 2 ) == 0 % N even
% The DC component is unique and should not be altered
% adjust value of element N / 2 to compensate for it being the sum of frequencies
data_in_freq_domain( N / 2 ) = abs( data_in_freq_domain( N / 2 ) ) / 2 ;

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : N / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( N / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = N / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

else % N odd
% The DC component is unique and should not be altered

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : ( N + 1 ) / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( ( N + 1 ) / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = ( N + 1 ) / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

end

% extract the constant term and frequencies of interest
non_zero_values = data_in_freq_domain( non_zero_values_ix ) ;

% now set all values to zero
data_in_freq_domain( 1 : end ) = 0.0 ;

% replace the non_zero_values
data_in_freq_domain( non_zero_values_ix ) = non_zero_values ;

inv = ifft( data_in_freq_domain ) ; % reverse the FFT

recovered_sig = real( inv ) ; % recover the real component of ifft for plotting

smooth( ii ) = recovered_sig( end ) + mean_value( ii ) ;

mean_smooth( ii ) = mean( smooth( ii-period : ii ) ) ;

end

smooth = smooth .+ ( mean_value .- mean_smooth ) ;

plot( smooth_price , 'r' ,price , 'c' , smooth , 'm' , super_smooth , 'g' )
legend( 'smooth price' , 'price' , 'fft smooth' , 'super smooth' )
This script prompts for terminal input for the values for an underlying sine wave period, a trend multiplier and a noise multiplier component to construct the "price" time series. Here is a plot of output with the settings
  • period = 20
  • trend multiplier = 1.3333
  • noise multiplier = 0.0
The cyan line is the "price," the magenta is the FFT smoother and the green is the "Super Smoother" filter shown for comparative purposes. As can be seen, after the "burn in" period for the FFT smoother calculations to settle down, the smoother is a perfect recovery of the price time series. However, this is without noise. With a noise multiplier value of 0.25 a typical plot is
where the underlying smooth "true" signal can now be seen in red. I think the performance of the magenta FFT smoother in recovering this "true" signal speaks for itself. My next post will show its performance on real price data.

Monday 2 September 2013

Filtering in the Frequency Domain with Octave's FFT Function

Despite my previous post in which I talked about the "Super Smoother" filter, it was bugging me that I couldn't get my attempts at FFT filtering to work. However, thanks in large part to my recent discovery of two online tutorials here and here I think I've finally cracked it. Taking these two tutorials as a template I have written an interactive Octave script, which is shown in the code box below.
clear all

fprintf( '\nThis is an interactive Octave tutorial to explain the basic principles\n' )
fprintf( 'of the Fast Fourier transform and how it can be used to smooth a time series\n' )
fprintf( 'in the frequency domain.\n\n' )
fprintf( 'A Fourier transform approximates any continuous function as the sum of periodic\n' )
fprintf( 'functions (sines and cosines). The FFT does the same thing for discrete signals,\n' )
fprintf( 'a series of data points rather than a continuously defined function.\n' ) 
fprintf( 'The FFT lets you identify periodic components in your discrete signal.\n' )
fprintf( 'You might need to identify a periodic signal buried under random noise,\n' )
fprintf( 'or analyze a signal with several different periodic underlying sources.\n' )
fprintf( 'Octave includes a built-in implementation of FFT to help you do this.\n\n' )

fprintf( 'First, take this 20 period sine wave. Since we know the period, the\n' )
fprintf( 'frequency = 1 / period = 1 / 20 = 0.05 Hz.\n\n' )

% set the period
period = 20 ; 
underlying_sine = 2.0 .* sinewave( period , period ) ; % create underlying sinewave function with amplitude 2

plot( underlying_sine , 'b' )
title( 'A Sine Wave of period 20 with an amplitude of 2' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

fprintf( 'Now, let us put this sine wave through the Octave fft function thus:\n\n' )
fprintf( ' data_in_freq_domain = fft( data ) ;  and plot this.\n\n' )

N = length( underlying_sine ) ;

% If length of fft vector is even, then the magnitude of the fft will be symmetric, such that the 
% first ( 1 + length / 2 ) points are unique, and the rest are symmetrically redundant. The DC component 
% of output is fft( 1 ) and fft( 1 + length / 2 ) is the Nyquist frequency component of vector. 
% If length is odd, however, the Nyquist frequency component is not evaluated, and the number of unique 
% points is ( length + 1 ) / 2 . This can be generalized for both cases to ceil( ( length + 1 ) / 2 ) 

data_in_freq_domain = fft( underlying_sine ) ; % take the fft of our underlying_sine
stem( abs( data_in_freq_domain ) ) ; % use abs command to get the magnitude.
xlabel( 'Sample Number, i.e. position in the plotted Octave fft output vector' )
ylabel( 'Amplitude' )
title( 'Using the basic Octave fft command' )
grid
axis( [ 0 , period + 1 , 0 , max(abs( data_in_freq_domain )) + 0.1 ] )

fprintf( 'It can be seen that the x-axis gives us no information about the frequencies\n' )
fprintf( 'and the spectrum is not centered around zero. Changing the x-axis to frequency\n' )
fprintf( 'and plotting a centered frequency plot (see code in file) gives this plot.\n\n' )

fprintf( 'Press enter to view centered frequency plot.\n\n' )
pause ;

% Fs is the sampling rate

% this part of the code generates the frequency axis
if mod( N , 2 ) == 0
    k = -N/2 : N/2-1 ; % N even
else
    k = -(N-1)/2 : (N-1)/2 ; % N odd
end

T = N / 1 ; %  where the frequency sampling rate Fs in T = N / Fs ; Fs is 1.

frequencyRange = k / T ; % the frequency axis

data_in_freq_domain = fft( underlying_sine ) / N ; % apply the FFT & normalize the data
centered_data_in_freq_domain = fftshift( data_in_freq_domain ) ; % shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'In this centered frequency chart it can be seen that the frequency\n' )
fprintf( 'with the greatest amplitude is 0.05 Hz, i.e. period = 1 / 0.05 = 20\n' )
fprintf( 'which we know to be the correct period of the underlying sine wave.\n' )
fprintf( 'The information within the centered frequency spectrum is entirely symmetric\n' )
fprintf( 'around zero and, in general, the positive side of the spectrum is used.\n\n' )
fprintf( 'Now, let us look at two sine waves together.\n' )

fprintf( 'Press enter to continue.\n\n' )
pause ;

% underlying_sine = 2.0 .* sinewave( period , period ) ; % create underlying sinewave function with amplitude 2
underlying_sine_2 = 0.75 .* sinewave( period , 7 ) ; % create underlying sinewave function with amplitude 0.75
composite_sine = underlying_sine .+ underlying_sine_2 ;

plot( underlying_sine , 'r' , underlying_sine_2 , 'g' , composite_sine , 'b' )
legend( 'Underlying sine period 20' , 'Underlying sine period 7' , 'Combined waveform' )
title( 'Two Sine Waves of periods 20 and 7, with Amplitudes of 2 and 0.75 respectively, combined' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

data_in_freq_domain = fft( composite_sine ) ; % apply the FFT
centered_data_in_freq_domain = fftshift( data_in_freq_domain / N ) ; % normalises & shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation on 2 Sine Waves' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'x-axis is frequencyRange.\n' )
frequencyRange
fprintf( '\nComparing the centered frequency chart with the frequencyRange vector now shown it\n' )
fprintf( 'can be seen that, looking at the positive side only, the maximum amplitude\n' )
fprintf( 'frequency is for period = 1 / 0.05 = a 20 period sine wave, and the next greatest\n' )
fprintf( 'amplitude frequency is period = 1 / 0.15 = a 6.6667 period sine wave, which rounds up\n' )
fprintf( 'to a period of 7, both of which are known to be the "correct" periods in the\n' )
fprintf( 'combined waveform. However, we can also see some artifacts in this frequency plot\n' )
fprintf( 'due to "spectral leakage." Now, let us do some filtering and see the plot.\n\n' )
fprintf( 'Press enter to continue.\n\n' )
pause ;

% The actual frequency filtering will be performed on the basic FFT output, remembering that: 

% if the number of time domain samples, N, is even:

% element 1 = constant or DC amplitude
% elements 2 to N/2 = amplitudes of positive frequency components in increasing order
% elements N to N/2 = amplitudes of negative frequency components in decreasing order

% Note that element N/2 is the algebraic sum of the highest positive and highest
% negative frequencies and, for real time domain signals, the imaginary
% components cancel and N/2 is always real. The first sinusoidal component
% (fundamental) has it's positive component in element 2 and its negative
% component in element N.

% If the number of time domain samples, N, is odd:

% element 1 = constant or DC amplitude
% elements 2 to (N+1)/2 = amplitudes of positive frequency components in increasing order
% elements N to ((N+1)/2 + 1) = amplitudes of negative frequency components in decreasing order

% Note that for an odd number of samples there is no "middle" element and all
% the frequency domain amplitudes will, in general, be complex. 

if mod( N , 2 ) == 0 % N even
% The DC component is unique and should not be altered
% adjust value of element N / 2 to compensate for it being the sum of frequencies
data_in_freq_domain( N / 2 ) = abs( data_in_freq_domain( N / 2 ) ) / 2 ;

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : N / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( N / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = N / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

else % N odd
% The DC component is unique and should not be altered

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : ( N + 1 ) / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( ( N + 1 ) / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = ( N + 1 ) / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

end

% extract the constant term and frequencies of interest
non_zero_values = data_in_freq_domain( non_zero_values_ix ) ;

% now set all values to zero
data_in_freq_domain( 1 : end ) = 0.0 ;

% replace the non_zero_values
data_in_freq_domain( non_zero_values_ix ) = non_zero_values ;

composite_sine_inv = ifft( data_in_freq_domain ) ; % reverse the FFT

recovered_underlying_sine = real( composite_sine_inv ) ; % recover the real component of ifft for plotting

plot( underlying_sine , 'r' , composite_sine , 'g' , recovered_underlying_sine , 'c' )
legend( 'Underlying sine period 20' , 'Composite Sine' , 'Recovered underlying sine' )
title( 'Recovery of Underlying Dominant Sine Wave from Composite Data via FFT filtering' )

fprintf( 'Here we can see that the period 7 sine wave has been removed\n' )
fprintf( 'The trick to smoothing in the frequency domain is to eliminate\n' )
fprintf( '"unwanted frequencies" by setting them to zero in the frequency\n' )
fprintf( 'vector. For the purposes of illustration here the 7 period sine wave\n' )
fprintf( 'is considered noise. After doing this, we reverse the FFT and plot\n' )
fprintf( 'the results. The code that does all this can be seen in the code\n' )
fprintf( 'file. Now let us repeat this with noise.\n\n' )
fprintf( 'Press enter to continue.\n\n' )
pause ;

noise = randn( period , 1 )' * 0.25 ; % create random noise
noisy_composite = composite_sine .+ noise ;

plot( underlying_sine , 'r' , underlying_sine_2 , 'm' , noise, 'k' , noisy_composite , 'g' )
legend( 'Underlying sine period 20' , 'Underlying sine period 7' , 'random noise' , 'Combined waveform' )
title( 'Two Sine Waves of periods 20 and 7, with Amplitudes of 2 and 0.75 respectively, combined with random noise' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

fprintf( 'Running the previous code again on this combined waveform with noise gives\n' )
fprintf( 'this centered plot.\n\n' )

data_in_freq_domain = fft( noisy_composite ) ; % apply the FFT
centered_data_in_freq_domain = fftshift( data_in_freq_domain / N ) ; % normalises & shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation on 2 Sine Waves plus random noise' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'Here we can see that the addition of the noise has introduced\n' )
fprintf( 'some unwanted frequencies to the frequency plot. The trick to\n' )
fprintf( 'smoothing in the frequency domain is to eliminate these "unwanted"\n' )
fprintf( 'frequencies by setting them to zero in the frequency vector that is\n' )
fprintf( 'shown in this plot. For the purposes of illustration here,\n' )
fprintf( 'all frequencies other than that for the 20 period sine wave are\n' )
fprintf( 'considered noise. After doing this, we reverse the FFT and plot\n' )
fprintf( 'the results. The code that does all this can be seen in the code\n' )
fprintf( 'file.\n\n' )
fprintf( 'Press enter to view the plotted result.\n\n' )
pause ;

% The actual frequency filtering will be performed on the basic FFT output, remembering that: 

% if the number of time domain samples, N, is even:

% element 1 = constant or DC amplitude
% elements 2 to N/2 = amplitudes of positive frequency components in increasing order
% elements N to N/2 = amplitudes of negative frequency components in decreasing order

% Note that element N/2 is the algebraic sum of the highest positive and highest
% negative frequencies and, for real time domain signals, the imaginary
% components cancel and N/2 is always real. The first sinusoidal component
% (fundamental) has it's positive component in element 2 and its negative
% component in element N.

% If the number of time domain samples, N, is odd:

% element 1 = constant or DC amplitude
% elements 2 to (N+1)/2 = amplitudes of positive frequency components in increasing order
% elements N to ((N+1)/2 + 1) = amplitudes of negative frequency components in decreasing order

% Note that for an odd number of samples there is no "middle" element and all
% the frequency domain amplitudes will, in general, be complex. 

if mod( N , 2 ) == 0 % N even
% The DC component is unique and should not be altered
% adjust value of element N / 2 to compensate for it being the sum of frequencies
data_in_freq_domain( N / 2 ) = abs( data_in_freq_domain( N / 2 ) ) / 2 ;

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : N / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( N / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = N / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

else % N odd
% The DC component is unique and should not be altered

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : ( N + 1 ) / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( ( N + 1 ) / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = ( N + 1 ) / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

end

% extract the constant term and frequencies of interest
non_zero_values = data_in_freq_domain( non_zero_values_ix ) ;

% now set all values to zero
data_in_freq_domain( 1 : end ) = 0.0 ;

% replace the non_zero_values
data_in_freq_domain( non_zero_values_ix ) = non_zero_values ;

noisy_composite_inv = ifft( data_in_freq_domain ) ; % reverse the FFT

recovered_underlying_sine = real( noisy_composite_inv ) ; % recover the real component of ifft for plotting

plot( underlying_sine , 'r' , noisy_composite , 'g' , recovered_underlying_sine , 'c' )
legend( 'Underlying sine period 20' , 'Noisy Composite' , 'Recovered underlying sine' )
title( 'Recovery of Underlying Dominant Sine Wave from Noisy Composite Data via FFT filtering' )

fprintf( 'As can be seen, the underlying sine wave of period 20 is almost completely\n' )
fprintf( 'recovered using frequency domain filtering. The fact that it is not perfectly\n' )
fprintf( 'recovered is due to artifacts (spectral leakage) intoduced into the frequency spectrum\n' )
fprintf( 'because the individual noisy composite signal components do not begin and end\n' )
fprintf( 'with zero values, or there are not integral numbers of these components present\n' )
fprintf( 'in the sampled, noisy composite signal.\n\n' )
fprintf( 'So there we have it - a basic introduction to using the FFT for signal smoothing\n' )
fprintf( 'in the frequency domain.\n\n' )
This script is liberally commented and has copious output to the terminal via fprintf statements and produces a series of plots of time series and their frequency chart plots. I would encourage readers to run this script for themselves and, if so inclined, give feedback.

The last chart output of the script will look something like this.
The red sinusoid represents an underlying periodic signal of period 20 and the green time series is a composite signal of the red line plus a 7 period sinusoid of less amplitude and random noise, these not being shown in the plot. The blue line is the resultant output of FFT filtering in the frequency domain of the green input. I must say I'm pretty pleased with this result and will be doing more work in this area shortly. Stay tuned!