Date: Wed, 26 Nov 2008 09:33:59 -0600
From: h.wickham at gmail.com
To: jholtman at gmail.com
Subject: Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
CC: daren76 at hotmail.com; r-help at stat.math.ethz.ch
On Wed, Nov 26, 2008 at 8:14 AM, jim holtman wrote:
Your time is being taken up in cor.test because you are calling it
100,000 times. So grin and bear it with the amount of work you are
asking it to do.
Here I am only calling it 100 time:
m1 <- matrix(rnorm(10000), ncol=100)
m2 <- matrix(rnorm(10000), ncol=100)
Rprof('/tempxx.txt')
system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }))
user system elapsed
8.86 0.00 8.89
so my guess is that calling it 100,000 times will take: 100,000 *
0.0886 seconds or about 3 hours.
You can make it ~3 times faster by vectorising the testing:
m1 <- matrix(rnorm(10000), ncol=100)
m2 <- matrix(rnorm(10000), ncol=100)
system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,
function(y) { cor.test(x,y)$p.value })}))
system.time({
r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
df <- nrow(m1) - 2
t <- sqrt(df) * r / sqrt(1 - r ^ 2)
p <- pt(t, df)
p <- 2 * pmin(p, 1 - p)
})
all.equal(cor.pvalues, p)
You can make cor much faster by stripping away all the error checking
code and calling the internal c function directly (suggested by the
Rprof output):
system.time({
r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
})
system.time({
r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
.Internal(cor(x, y, 4L, FALSE)) })})
})
1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25
time speed up - I suspect you can do even better if you think about
what calculations are being duplicated in the computation of the
correlations.
Hadley
--
http://had.co.nz/