Monday 2 September 2013

Filtering in the Frequency Domain with Octave's FFT Function

Despite my previous post in which I talked about the "Super Smoother" filter, it was bugging me that I couldn't get my attempts at FFT filtering to work. However, thanks in large part to my recent discovery of two online tutorials here and here I think I've finally cracked it. Taking these two tutorials as a template I have written an interactive Octave script, which is shown in the code box below.
clear all

fprintf( '\nThis is an interactive Octave tutorial to explain the basic principles\n' )
fprintf( 'of the Fast Fourier transform and how it can be used to smooth a time series\n' )
fprintf( 'in the frequency domain.\n\n' )
fprintf( 'A Fourier transform approximates any continuous function as the sum of periodic\n' )
fprintf( 'functions (sines and cosines). The FFT does the same thing for discrete signals,\n' )
fprintf( 'a series of data points rather than a continuously defined function.\n' ) 
fprintf( 'The FFT lets you identify periodic components in your discrete signal.\n' )
fprintf( 'You might need to identify a periodic signal buried under random noise,\n' )
fprintf( 'or analyze a signal with several different periodic underlying sources.\n' )
fprintf( 'Octave includes a built-in implementation of FFT to help you do this.\n\n' )

fprintf( 'First, take this 20 period sine wave. Since we know the period, the\n' )
fprintf( 'frequency = 1 / period = 1 / 20 = 0.05 Hz.\n\n' )

% set the period
period = 20 ; 
underlying_sine = 2.0 .* sinewave( period , period ) ; % create underlying sinewave function with amplitude 2

plot( underlying_sine , 'b' )
title( 'A Sine Wave of period 20 with an amplitude of 2' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

fprintf( 'Now, let us put this sine wave through the Octave fft function thus:\n\n' )
fprintf( ' data_in_freq_domain = fft( data ) ;  and plot this.\n\n' )

N = length( underlying_sine ) ;

% If length of fft vector is even, then the magnitude of the fft will be symmetric, such that the 
% first ( 1 + length / 2 ) points are unique, and the rest are symmetrically redundant. The DC component 
% of output is fft( 1 ) and fft( 1 + length / 2 ) is the Nyquist frequency component of vector. 
% If length is odd, however, the Nyquist frequency component is not evaluated, and the number of unique 
% points is ( length + 1 ) / 2 . This can be generalized for both cases to ceil( ( length + 1 ) / 2 ) 

data_in_freq_domain = fft( underlying_sine ) ; % take the fft of our underlying_sine
stem( abs( data_in_freq_domain ) ) ; % use abs command to get the magnitude.
xlabel( 'Sample Number, i.e. position in the plotted Octave fft output vector' )
ylabel( 'Amplitude' )
title( 'Using the basic Octave fft command' )
grid
axis( [ 0 , period + 1 , 0 , max(abs( data_in_freq_domain )) + 0.1 ] )

fprintf( 'It can be seen that the x-axis gives us no information about the frequencies\n' )
fprintf( 'and the spectrum is not centered around zero. Changing the x-axis to frequency\n' )
fprintf( 'and plotting a centered frequency plot (see code in file) gives this plot.\n\n' )

fprintf( 'Press enter to view centered frequency plot.\n\n' )
pause ;

% Fs is the sampling rate

% this part of the code generates the frequency axis
if mod( N , 2 ) == 0
    k = -N/2 : N/2-1 ; % N even
else
    k = -(N-1)/2 : (N-1)/2 ; % N odd
end

T = N / 1 ; %  where the frequency sampling rate Fs in T = N / Fs ; Fs is 1.

frequencyRange = k / T ; % the frequency axis

data_in_freq_domain = fft( underlying_sine ) / N ; % apply the FFT & normalize the data
centered_data_in_freq_domain = fftshift( data_in_freq_domain ) ; % shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'In this centered frequency chart it can be seen that the frequency\n' )
fprintf( 'with the greatest amplitude is 0.05 Hz, i.e. period = 1 / 0.05 = 20\n' )
fprintf( 'which we know to be the correct period of the underlying sine wave.\n' )
fprintf( 'The information within the centered frequency spectrum is entirely symmetric\n' )
fprintf( 'around zero and, in general, the positive side of the spectrum is used.\n\n' )
fprintf( 'Now, let us look at two sine waves together.\n' )

fprintf( 'Press enter to continue.\n\n' )
pause ;

% underlying_sine = 2.0 .* sinewave( period , period ) ; % create underlying sinewave function with amplitude 2
underlying_sine_2 = 0.75 .* sinewave( period , 7 ) ; % create underlying sinewave function with amplitude 0.75
composite_sine = underlying_sine .+ underlying_sine_2 ;

plot( underlying_sine , 'r' , underlying_sine_2 , 'g' , composite_sine , 'b' )
legend( 'Underlying sine period 20' , 'Underlying sine period 7' , 'Combined waveform' )
title( 'Two Sine Waves of periods 20 and 7, with Amplitudes of 2 and 0.75 respectively, combined' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

data_in_freq_domain = fft( composite_sine ) ; % apply the FFT
centered_data_in_freq_domain = fftshift( data_in_freq_domain / N ) ; % normalises & shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation on 2 Sine Waves' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'x-axis is frequencyRange.\n' )
frequencyRange
fprintf( '\nComparing the centered frequency chart with the frequencyRange vector now shown it\n' )
fprintf( 'can be seen that, looking at the positive side only, the maximum amplitude\n' )
fprintf( 'frequency is for period = 1 / 0.05 = a 20 period sine wave, and the next greatest\n' )
fprintf( 'amplitude frequency is period = 1 / 0.15 = a 6.6667 period sine wave, which rounds up\n' )
fprintf( 'to a period of 7, both of which are known to be the "correct" periods in the\n' )
fprintf( 'combined waveform. However, we can also see some artifacts in this frequency plot\n' )
fprintf( 'due to "spectral leakage." Now, let us do some filtering and see the plot.\n\n' )
fprintf( 'Press enter to continue.\n\n' )
pause ;

% The actual frequency filtering will be performed on the basic FFT output, remembering that: 

% if the number of time domain samples, N, is even:

% element 1 = constant or DC amplitude
% elements 2 to N/2 = amplitudes of positive frequency components in increasing order
% elements N to N/2 = amplitudes of negative frequency components in decreasing order

% Note that element N/2 is the algebraic sum of the highest positive and highest
% negative frequencies and, for real time domain signals, the imaginary
% components cancel and N/2 is always real. The first sinusoidal component
% (fundamental) has it's positive component in element 2 and its negative
% component in element N.

% If the number of time domain samples, N, is odd:

% element 1 = constant or DC amplitude
% elements 2 to (N+1)/2 = amplitudes of positive frequency components in increasing order
% elements N to ((N+1)/2 + 1) = amplitudes of negative frequency components in decreasing order

% Note that for an odd number of samples there is no "middle" element and all
% the frequency domain amplitudes will, in general, be complex. 

if mod( N , 2 ) == 0 % N even
% The DC component is unique and should not be altered
% adjust value of element N / 2 to compensate for it being the sum of frequencies
data_in_freq_domain( N / 2 ) = abs( data_in_freq_domain( N / 2 ) ) / 2 ;

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : N / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( N / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = N / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

else % N odd
% The DC component is unique and should not be altered

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : ( N + 1 ) / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( ( N + 1 ) / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = ( N + 1 ) / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

end

% extract the constant term and frequencies of interest
non_zero_values = data_in_freq_domain( non_zero_values_ix ) ;

% now set all values to zero
data_in_freq_domain( 1 : end ) = 0.0 ;

% replace the non_zero_values
data_in_freq_domain( non_zero_values_ix ) = non_zero_values ;

composite_sine_inv = ifft( data_in_freq_domain ) ; % reverse the FFT

recovered_underlying_sine = real( composite_sine_inv ) ; % recover the real component of ifft for plotting

plot( underlying_sine , 'r' , composite_sine , 'g' , recovered_underlying_sine , 'c' )
legend( 'Underlying sine period 20' , 'Composite Sine' , 'Recovered underlying sine' )
title( 'Recovery of Underlying Dominant Sine Wave from Composite Data via FFT filtering' )

fprintf( 'Here we can see that the period 7 sine wave has been removed\n' )
fprintf( 'The trick to smoothing in the frequency domain is to eliminate\n' )
fprintf( '"unwanted frequencies" by setting them to zero in the frequency\n' )
fprintf( 'vector. For the purposes of illustration here the 7 period sine wave\n' )
fprintf( 'is considered noise. After doing this, we reverse the FFT and plot\n' )
fprintf( 'the results. The code that does all this can be seen in the code\n' )
fprintf( 'file. Now let us repeat this with noise.\n\n' )
fprintf( 'Press enter to continue.\n\n' )
pause ;

noise = randn( period , 1 )' * 0.25 ; % create random noise
noisy_composite = composite_sine .+ noise ;

plot( underlying_sine , 'r' , underlying_sine_2 , 'm' , noise, 'k' , noisy_composite , 'g' )
legend( 'Underlying sine period 20' , 'Underlying sine period 7' , 'random noise' , 'Combined waveform' )
title( 'Two Sine Waves of periods 20 and 7, with Amplitudes of 2 and 0.75 respectively, combined with random noise' ) 
fprintf( 'After viewing chart, press enter to continue.\n\n' )
pause ;

fprintf( 'Running the previous code again on this combined waveform with noise gives\n' )
fprintf( 'this centered plot.\n\n' )

data_in_freq_domain = fft( noisy_composite ) ; % apply the FFT
centered_data_in_freq_domain = fftshift( data_in_freq_domain / N ) ; % normalises & shifts the fft data so that it is centered

% remember to take the abs of data_in_freq_domain to get the magnitude!
stem( frequencyRange , abs( centered_data_in_freq_domain ) ) ;
xlabel( 'Freq (Hz)' )
ylabel( 'Amplitude' )
title( 'Using the centered FFT function with Amplitude normalisation on 2 Sine Waves plus random noise' )
grid
axis( [ min(frequencyRange) - 0.1 , max(frequencyRange) + 0.1 , 0 , max(abs( centered_data_in_freq_domain )) + 0.1 ] )

fprintf( 'Here we can see that the addition of the noise has introduced\n' )
fprintf( 'some unwanted frequencies to the frequency plot. The trick to\n' )
fprintf( 'smoothing in the frequency domain is to eliminate these "unwanted"\n' )
fprintf( 'frequencies by setting them to zero in the frequency vector that is\n' )
fprintf( 'shown in this plot. For the purposes of illustration here,\n' )
fprintf( 'all frequencies other than that for the 20 period sine wave are\n' )
fprintf( 'considered noise. After doing this, we reverse the FFT and plot\n' )
fprintf( 'the results. The code that does all this can be seen in the code\n' )
fprintf( 'file.\n\n' )
fprintf( 'Press enter to view the plotted result.\n\n' )
pause ;

% The actual frequency filtering will be performed on the basic FFT output, remembering that: 

% if the number of time domain samples, N, is even:

% element 1 = constant or DC amplitude
% elements 2 to N/2 = amplitudes of positive frequency components in increasing order
% elements N to N/2 = amplitudes of negative frequency components in decreasing order

% Note that element N/2 is the algebraic sum of the highest positive and highest
% negative frequencies and, for real time domain signals, the imaginary
% components cancel and N/2 is always real. The first sinusoidal component
% (fundamental) has it's positive component in element 2 and its negative
% component in element N.

% If the number of time domain samples, N, is odd:

% element 1 = constant or DC amplitude
% elements 2 to (N+1)/2 = amplitudes of positive frequency components in increasing order
% elements N to ((N+1)/2 + 1) = amplitudes of negative frequency components in decreasing order

% Note that for an odd number of samples there is no "middle" element and all
% the frequency domain amplitudes will, in general, be complex. 

if mod( N , 2 ) == 0 % N even
% The DC component is unique and should not be altered
% adjust value of element N / 2 to compensate for it being the sum of frequencies
data_in_freq_domain( N / 2 ) = abs( data_in_freq_domain( N / 2 ) ) / 2 ;

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : N / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( N / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = N / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

else % N odd
% The DC component is unique and should not be altered

[ max1 , ix1 ] = max( abs( data_in_freq_domain( 2 : ( N + 1 ) / 2 ) ) ) ; % get the index of the first max value
ix1 = ix1 + 1 ;

[ max2 , ix2 ] = max( abs( data_in_freq_domain( ( N + 1 ) / 2 + 1 : end ) ) ) ; % get index of the second max value
ix2 = ( N + 1 ) / 2 + ix2 ;

non_zero_values_ix = [ 1 ix1 ix2 ] ; % indices for DC component, +ve freq max & -ve freq max

end

% extract the constant term and frequencies of interest
non_zero_values = data_in_freq_domain( non_zero_values_ix ) ;

% now set all values to zero
data_in_freq_domain( 1 : end ) = 0.0 ;

% replace the non_zero_values
data_in_freq_domain( non_zero_values_ix ) = non_zero_values ;

noisy_composite_inv = ifft( data_in_freq_domain ) ; % reverse the FFT

recovered_underlying_sine = real( noisy_composite_inv ) ; % recover the real component of ifft for plotting

plot( underlying_sine , 'r' , noisy_composite , 'g' , recovered_underlying_sine , 'c' )
legend( 'Underlying sine period 20' , 'Noisy Composite' , 'Recovered underlying sine' )
title( 'Recovery of Underlying Dominant Sine Wave from Noisy Composite Data via FFT filtering' )

fprintf( 'As can be seen, the underlying sine wave of period 20 is almost completely\n' )
fprintf( 'recovered using frequency domain filtering. The fact that it is not perfectly\n' )
fprintf( 'recovered is due to artifacts (spectral leakage) intoduced into the frequency spectrum\n' )
fprintf( 'because the individual noisy composite signal components do not begin and end\n' )
fprintf( 'with zero values, or there are not integral numbers of these components present\n' )
fprintf( 'in the sampled, noisy composite signal.\n\n' )
fprintf( 'So there we have it - a basic introduction to using the FFT for signal smoothing\n' )
fprintf( 'in the frequency domain.\n\n' )
This script is liberally commented and has copious output to the terminal via fprintf statements and produces a series of plots of time series and their frequency chart plots. I would encourage readers to run this script for themselves and, if so inclined, give feedback.

The last chart output of the script will look something like this.
The red sinusoid represents an underlying periodic signal of period 20 and the green time series is a composite signal of the red line plus a 7 period sinusoid of less amplitude and random noise, these not being shown in the plot. The blue line is the resultant output of FFT filtering in the frequency domain of the green input. I must say I'm pretty pleased with this result and will be doing more work in this area shortly. Stay tuned!

No comments: