Skip to content
Prev 109329 / 398500 Next

processing a large matrix

One approach is to split up the work of doing the correlations, if you
give the 'cor' function 2 matricies then it gives you the correlations
between all pairs of columns.  Since you said it works fine with 10,000
columns but not 30,000 you could split into 3 pieces and do something
like (untested):

 out <- rbind(  
	cbind( cor(mymatrix[,1:10000])^2, 
            cor(mymatrix[,1:10000], mymatrix[10001:20000])^2, 
            cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ),
     cbind( matrix(NA,10000,10000),
            cor(mymatrix[,10001:20000])^2,
            cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2),
     cbind( matrix(NA,10000,10000),
            matrix(NA,10000,10000),
            cor(mymatrix[,20001:30000])^2 )
     )

out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ]

For breaking into 3 pieces, this is probably easier/quicker than trying
to find and alternative.  If you need to break it into even more pieces
(doing blocks of 1,000 when there are 30,000 columns) then there are
probably better alternatives (you could do a loop over blocks, that
would be faster than the loop over individual columns).

Hope this helps,