Skip to content

"Fast" correlation algorithm

3 messages · jastar, Greg Snow, Joshua Stults

#
Hi,
Is in R any "fast" algorithm for correlation?
What I mean is:
I have very large dataset (microarray) with 55000 rows and 100 columns. I
want to count correlation (p-value and cor.coef) between each row of dataset
and some vector (of course length of this vector is equal to number of
columns of dataset).
In short words:
For t-test we have:
"normal" algorithm - t.test
"fast" algorithm - rowttests
For correlation:
"normal" algorithm - cor.test
"fast" algorithm - ???

Thank's for help
#
Well if your matrix and vector are centered and properly scaled (and there are no missing values), then the correlations are just a crossproduct and matrix arithmetic is already fairly fast (assuming you have enough memory).
#
If you need auto(cross)correlations in O(n*log(n)) rather than O(n^2)
you can use an FFT.  Here's a good short write-up on using the FFT for
this (numerical recipes chapter):

http://hebb.mit.edu/courses/9.29/2002/readings/c13-2.pdf

Won't get you p values, but is faster than a normal matrix-vector
multiply.  If I understand your post correctly though, you are doing
bunches of vectors of dimension ~100, probably the standard method is
plenty fast, you may not see speed up by using an FFT for vectors this
small (larger overhead for the transform -> operations -> inverse
transform).
On Thu, May 14, 2009 at 5:02 PM, Greg Snow <Greg.Snow at imail.org> wrote: