speed up in R apply
On Wed, Jan 5, 2011 at 10:49 PM, Young Cho <young.stat at gmail.com> wrote:
When introduced to R, I learned how to use *apply whenever I could to avoid for-loops and all. And, getting the habit, I think I somehow got the mis-conception that it is a magic source, always an optimal way of coding in R.
See [1] for an article on vectorisation and loops in R. Liviu [1] http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf
Thanks a lot for all of your helpful advice and comment! Young On Wed, Jan 5, 2011 at 3:09 PM, David Winsemius <dwinsemius at comcast.net>wrote:
On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote: ?On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
On Jan 5, 2011, at 10:03 AM, Young Cho wrote: ?Hi,
I am doing some simulations and found a bottle neck in my R script. I made an example: ?a = matrix(rnorm(5000000),1000000,5)
tt ?= Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt
[1] -1291.026 Time difference of 0.2354031 secs
tt ?= Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt
[1] -1291.026 Time difference of 20.23150 secs Is there a faster way of calculating sum of products (of columns, or of rows)?
You should look at crossprod and tcrossprod.
Hmm. ?Not sure that would help, David. ?You could use a matrix multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but of course you could also use rowSums to get those.
Thanks for pointing ?that out. I misread the OP's code.
?And is this an expected behavior?
Yes. For loops and *apply strategies are slower than the proper use of vectorized functions.
To expand a bit on David's point, the apply function isn't magic. ?It essentially loops over the rows, in this case. ?By multiplying columns together you are performing the looping over the rows in compiled code, which is much, much faster. ?If you want to do this kind of operation effectively in R for a general matrix (i.e. not knowing in advance that it has exactly 5 columns) you could use Reduce ?a <- matrix(rnorm(5000000),1000000,5)
system.time(pr1 <- a[,1]*a[,2]*a[,3]*a[,4]*a[,5])
?user ?system elapsed ?0.15 ? ?0.09 ? ?0.37
system.time(pr2 <- apply(a, 1, prod))
?user ?system elapsed 22.090 ? 0.140 ?22.902
all.equal(pr1, pr2)
[1] TRUE
system.time(pr3 <- Reduce(get("*"), as.data.frame(a), rep(1, nrow(a))))
Slightly faster would be:
system.time(pr3 <- Reduce("*", as.data.frame(a)))
And thanks for the nice example. Using a data.frame to feed Reduce
materially enhances its value to me.
? user ?system elapsed
?0.410 ? 0.010 ? 0.575
all.equal(pr3, pr2)
[1] TRUE
-- David Winsemius, MD West Hartford, CT
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail