Speeding up "accumulation" code in large matrix calc?
On Fri, Feb 24, 2012 at 09:06:28PM +0100, Berend Hasselman wrote:
[...]
Summary of results
Nrow <- 100 Ncol <- 1000 A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow) print(rbind(z1,z2,z3,z4,z5,z6))
user.self sys.self elapsed user.child sys.child z1 0.543 0.005 0.551 0 0 z2 0.299 0.004 0.304 0 0 z3 0.070 0.012 0.082 0 0 z4 0.112 0.002 0.114 0 0 z5 0.263 0.007 0.271 0 0 z6 0.062 0.009 0.070 0 0
[...]
The original function (t1) benefits a lot from compiling to byte code. The function that operates on columns (t3) seems to be always the quickest in this experiment.
Hi.
Let me include one more solution operating on rows to the test.
t4 <- function(A, outcome) {
oneRow <- function(x, outcome)
{
n <- length(x)
y <- 1:n
z <- y
z[c(FALSE, x[-n] != outcome)] <- 0
y - cummax(z)
}
t(apply(A, 1, oneRow, outcome=outcome))
}
library(compiler)
t1.c <- cmpfun(t1)
t3.c <- cmpfun(t3)
t4.c <- cmpfun(t4)
Nrow <- 100
Ncol <- 1000
A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
library(rbenchmark)
benchmark(t1(A,outcome=1),
t3(A,outcome=1),
t4(A,outcome=1),
t1.c(A,outcome=1),
t3.c(A,outcome=1),
t4.c(A,outcome=1),
columns=c("test", "user.self", "sys.self", "relative"),
replications=1)
test user.self sys.self relative
1 t1(A, outcome = 1) 0.744 0 46.5000
4 t1.c(A, outcome = 1) 0.284 0 17.6875
2 t3(A, outcome = 1) 0.076 0 4.7500
5 t3.c(A, outcome = 1) 0.072 0 4.6250
3 t4(A, outcome = 1) 0.016 0 1.0000
6 t4.c(A, outcome = 1) 0.020 0 1.1250
Here, t4(), t4.c() is faster than t3(), t3.c().
With
Nrow <- 20000
Ncol <- 50
A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
i get
test user.self sys.self relative
1 t1(A, outcome = 1) 7.444 0 20.233696
4 t1.c(A, outcome = 1) 2.740 0 7.442935
2 t3(A, outcome = 1) 0.368 0 1.000000
5 t3.c(A, outcome = 1) 0.368 0 1.002717
3 t4(A, outcome = 1) 0.592 0 1.605978
6 t4.c(A, outcome = 1) 0.536 0 1.456522
Here, t3() and t3.c() are faster than t4(), t4.c().
Petr.