FFT, frequs, magnitudes, phases - R-help

Tue, Aug 30, 2005 2:35 AM #

Hi,

here is some info about the first part of my "homework", for those, who want 
to break down their signal (heart beat or whatever) into a collection of pure 
sin waves to analyse "main" frequency magnitudes and phases.

First some very un-mathematical "applied" theory:

If you sample a waveform signal (heart beat pressure pulses, ECG, doppler flow 
signals, etc.) with a certain data acquisition frequency, an fft of your data 
gives you the decomposition/breakdown of the waveform signal into a series of 
pure sin waves of different frequencies. Each sin wave in that list has:
a) a certain "magnitude", i.e. a measure of how much that particular frequency 
participates in the generation of your signal, and 
b) a phase, i.e. the starting point of each sin wave.

Two characteristics of an fft have to be considered:
1) the highest meaningful sin wave frequency of your fft-analysis of the 
original waveform signal is half the data acquisition frequency (actually, 
R's fft gives you a list of frequencies up to the acquisition frequency, but 
you can only use the first half of it, see below)
2) the frequency resolution of your fft-analysis depends on the sampling time. 
The longer the sampling/analysis interval, the finer the resolution. 
Frequency resolution is actually 1 divided by sampling time (sec).

An example:
- some complicated waveform signal
- 1000 Hz data acquisition frequency (going on for hours)
- fft-analysis of data blocks of 1 sec length
Result:
- vector of frequencies from 1 to 500 Hz with a resolution of 1 Hz, 
corresponding vector of magnitudes (one for each frequency) and phases 
(dito).
You can now e.g. pick the frequency with the highest magnitude within that 1 
sec block and continue the fft analysis in 1 sec blocks for the complete data 
set, analysing the time course of the "main" frequency of your waveform 
signal.

If you need higher frequency resolution, increase the block length. Analysis 
of a 5 sec block will give you a list of frequencies from 0.2 to 500 Hz with 
a resolution of 0.2 Hz. However, increasing analysis-block length decreases 
temporal resolution, i.e. "main" frequency are now calculated only every 5 
sec and not 1 sec.

What does R's fft() deliver?

fft() is calculated with a single one-dimensional vector. Information on data 
acquisition frequency and block length (in sec or whatever) can not be 
included into the fft()-call.

R delivers a single one-dimensional vector of the same length as the data 
vector containing a list of imaginary numbers.
To extract the "magnitudes" use Mod(fft()).
The magnitudes can also be calculated using the formula:
magnitude = square root (real * real + imaginary * imaginary)
real: Re(fft()), imaginary: Im(fft())

Confusingly, if you calculate fft() on a sample vector consisting of 2 pure 
sin frequencies, you get 4 peaks, not 2.

As stated above, fft() gives only "meaningful" frequency up to half the 
sampling frequency. R, however, gives you frequencies up to the sampling 
frequency. The point is, that sampling a signal in discret time intervals 
causes aliasing problems. E.g. when sampling a 50 Hz sin wave and 950 Hz sin 
wave with 1000 Hz, the results will be identical. An fft can not distinguish 
between the two frequencies. Therefore, the sampling frequency should always 
be at least twice as high as the expected signal frequency.
So for each actual frequency in the signal, fft() will give 2 peaks (one at 
the "actual" frequency and one at sampling frequency minus "actual" 
frequency), making the second half of the magnitude vector a mirror image of 
the first half.
As long as the sampling frequency was at least twice as high as the expected 
signal frequency, all "meaningful" information is contained in the the first 
half of the magnitude vector. A peak in the low frequency range might 
nevertheless still be caused by a high "noise" frequency.

The vector of magnitudes extraced so far only has an index an no associated 
frequencies.

To calculated the frequencies, simply take (or generate) the index vector (1 
to length(magnitude vector) and divide by the length of the data block (in 
sec).


That's it for now. The second half of my "homework" will be delivered as soon 
as I understand what to make out of the phases given by R.
I again would expect a vector of the same length as the magnitude vector with 
the phases (0 to 2*pi or -pi to +pi) of each frequency. However, I do not 
know yet what R calculates.
I would be most obliged for any comments and help.


Wolfgang

---------------------------------------------------------------------------
# R-script
acq.freq <- 4000       # data acquisition frequency (Hz)
sig1.freq <- 50           # frequency of 1st signal component (Hz)
sig2.freq <- 130        # frequency of 2nd signal component (Hz)
time <- 5                    # measuring time interval (s)

# vector of sampling time-points (s)
smpl.int <- (1:(time*acq.freq))/acq.freq  

# data vector containing two frequencies (2nd frequ with phase shift)
data <- sin(sig1.freq*smpl.int*2*pi)+sin(sig2.freq*smpl.int*2*pi+pi/2)

plot(data,type="l")

# calculate fft of data
test <- fft(data)

# extract magnitudes and phases
magn <- Mod(test) # sqrt(Re(test)*Re(test)+Im(test)*Im(test))
phase <- Arg(test) # atan(Im(test)/Re(test))

# select only first half of vectors
magn.1 <- magn[1:(length(magn)/2)]
#phase.1 <- Arg(test)[1:(length(test)/2)]

# plot various vectors

# plot magnitudes as analyses by R
x11()
plot(magn,type="l")

# plot first half of magnitude vector
x11()
plot(magn.1,type="l")

# generate x-axis with frequencies
x.axis <- 1:length(magn.1)/time

# plot magnitudes against frequencies
x11()
plot(x=x.axis,y=magn.1,type="l")

Dr. Wolfgang Waser
Humbolt-Universit??t zu Berlin
Institute of Biology
Department of Animal Physiology
Philippstrasse 13, Abderhaldenhaus
10115 Berlin
Germany
Tel: +49 (0)30 2093 6173
Fax: +49 (0)30 2093 6375

Michael A. Miller

Tue, Aug 30, 2005 10:35 AM #

> I would be most obliged for any comments and help.

Wolfgang,

I've used R's fft to filter ECG signals and will comment on your
commentary based on my experience.  First, as an easily
accessible reference, I suggest "The Scientist and Engineer's
Guide to Digital Signal Processing," which is available in pdf
form at http://www.dspguide.com.  It includes several chapters on
the discrete Fourier transform and the fast Fourier transform
algorithm (which is what R's fft implements) and a chapter on
applications that contains info on spectral analysis.

    > What does R's fft() deliver?

    > fft() is calculated with a single one-dimensional
    > vector. Information on data acquisition frequency and block
    > length (in sec or whatever) can not be included into the
    > fft()-call.

    > Confusingly, if you calculate fft() on a sample vector
    > consisting of 2 pure sin frequencies, you get 4 peaks, not
    > 2.

That is the nature of the fft algorithm.  It returns values of
the discrete Fourier transform for both positive and negative
frequencies.

    > As stated above, fft() gives only "meaningful" frequency up
    > to half the sampling frequency. R, however, gives you
    > frequencies up to the sampling frequency. 

It is important to remember that the fft algorithm doesn't return
any frequency data at all.  It returns values of the fft that
correspond to frequencies from -f_Nyquist to +f_Nyquist.  It is
up to the user to calculate the frequency values.

Here's an example:

## Read some sample ecg data
ecg <- read.table('http://www.indyrad.iupui.edu/public/mmiller3/sample-ecg-1kHz.txt')
names(ecg) <- c('t','ecg')

ecg$t <- ecg$t/1000  # convert from ms to s

par(mfrow=c(2,2))

## Plot the ecg:
plot(ecg ~ t, data=ecg, type='l', main='ECG data sampled at 1 kHz', xlab='Time [s]')

## Calculate fft(ecg):
ecg$fft <- fft(ecg$ecg)

## Plot fft(ecg):
#plot(ecg$fft, type='l')

## Plot Mod(fft(ecg)):
plot(Mod(ecg$fft), type='l', log='y', main='FFT of ecg vs index')

## Find the sample period:
delta <- ecg$t[2] - ecg$t[1]

## Calculate the Nyquist frequency:
f.Nyquist <- 1 / 2 / delta

## Calculate the frequencies.  (Since ecg$t is in seconds, delta
## is in seconds, f.Nyquist is in Hz and ecg$freq is in Hz)
## (Note: I may be off by 1 in indexing here ????)
ecg$freq <- f.Nyquist*c(seq(nrow(ecg)/2), -rev(seq(nrow(ecg)/2)))/(nrow(ecg)/2)

## Plot fft vs frequency
plot(Mod(fft) ~ freq, data=ecg, type='l', log='y', main='FFT of ECG vs frequency', xlab='Frequency [Hz]')

## Now let's look at some artificial data:
x <- seq(100000)/1000  # pretend we're sampling at 1 kHz

## We'll put in two frequency components, plus a dc offset
f1 <- 5  # Hz
f2 <- 2  # Hz
y <- 0.1*sin(2*pi*f1*x) + sin(2*pi*f2*x) + 50
fft.y <- fft(y)
delta <- x[2] - x[1]
f.Nyquist <- 1 / 2 / delta
f <- f.Nyquist*c(seq(length(x)/2), -rev(seq(length(x)/2)))/(length(x)/2)

par(mfrow=c(2,2))
plot(x,y, type='l', xlim=c(0,20))
plot(f, Mod(fft.y), type='l', log='y')

## Now let's zoom in and mark the points were I expect to see peaks:
plot(f, Mod(fft.y), type='l', log='y', xlim=c(-10,10))
rug(c(-f1, -f2, 0, f1, f2), col='red', side=3)




Hope this is helpful, Mike

Michael A. Miller                               mmiller3 at iupui.edu
  Imaging Sciences, Department of Radiology, IU School of Medicine