Skip to content
Prev 164798 / 398503 Next

[ExternalEmail] Pearson Correlation Speed

On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote:

            
I think you are on the wrong track, Nathan.

The matrix you are starting with is 18563 x 18563 and the result of 
finding the correlations amongst the columns of that matrix is also 18563 
x 18563. It will require more than 5 Gigabytes of memory to store the 
result and the original matrix.

Likely the time needed to do the calc is inflated because of caching 
issues and if your machine has less than enough memory to store the 
result and all the intermediate pieces by swapping as well.

You can finesse these by breaking your problem into smaller pieces, say 
computing the correlations between each pair of 19 blocks of columns 
(columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the 
results.

---

BTW, R already has the necessary machinery to calculate the crossproduct 
matrix (etc) needed to find the correlations. You can access the low level 
linear algebra that R uses. You can marry R to an optimized BLAS if you 
like.

So pulling in some other code to do this will not save you anything. If 
you ever do decide to import C[++] code there is excellent documentation 
in the Writing R Extensions manual, which you should review before 
attempting to import C++ code into R.

HTH,

Chuck
[snip]


Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901