Hey, I just read another post about calling R from C. Someone on stackoverflow (DWin makes me suspect its David W.?) referenced this: http://www.math.univ-montp2.fr/~pudlo/R_files/call_R.pdf Which made me think: Why is a loop in R bad, but in C not? And where exactly does looping cost the most? I wrote a piece of code for my bachelor's thesis where I loop from 1 to 500, and estimate a boosted model in every iteration. The procedure takes 2-6 minutes. In this example the loop (instead of some kind of apply()) shouldn't cost too much time, right? I suspect it's way worse if someone would loop from 1 to 10000 and perform only a small task (a mean(), for example) in each loop. Can someone confirm this? Regards, Alex
Why is looping in R inefficient, but in C not?
4 messages · Alexander Engelhardt, Jeff Newmiller, Patrick Burns +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110626/769e520e/attachment.pl>
Probably the easiest way to think about it is that most of the extra time is the overhead of calling a function. So counting the number of calls to R functions is going to tell you how much overhead there is. (Remember that functions call other functions.)
On 26/06/2011 08:21, Jeff Newmiller wrote:
For the same reason the Cray XMP was fast at numerical computations... a loop written in a low level language can be optimized to work faster than one written in a higher level language. The XMP optimized loops into hardware, but R just optimizes them in C code, exposed to the R programmer as vector operations. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Alexander Engelhardt<alex at chaotic-neutral.de> wrote: Hey, I just read another post about calling R from C. Someone on stackoverflow (DWin makes me suspect its David W.?) referenced this: http://www.math.univ-montp2.fr/~pudlo/R_files/call_R.pdf Which made me think: Why is a loop in R bad, but in C not? And where exactly does looping cost the most? I wrote a piece of code for my bachelor's thesis where I loop from 1 to 500, and estimate a boosted model in every iteration. The procedure takes 2-6 minutes. In this example the loop (instead of some kind of apply()) shouldn't cost too much time, right? I suspect it's way worse if someone would loop from 1 to 10000 and perform only a small task (a mean(), for example) in each loop. Can someone confirm this? Regards, Alex
_____________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Patrick Burns pburns at pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno')
On Jun 26, 2011, at 2:56 AM, Alexander Engelhardt wrote:
Hey, I just read another post about calling R from C. Someone on stackoverflow (DWin makes me suspect its David W.?) referenced this: http://www.math.univ-montp2.fr/~pudlo/R_files/call_R.pdf Which made me think: Why is a loop in R bad, but in C not?
I do not think the cited authority provides any support to that notion. It rather suggests that things which might benefit from using a compiler can be fairly easily passed to C.
And where exactly does looping cost the most? I wrote a piece of code for my bachelor's thesis where I loop from 1 to 500, and estimate a boosted model in every iteration. The procedure takes 2-6 minutes. In this example the loop (instead of some kind of apply()) shouldn't cost too much time, right?
I suspect it's way worse if someone would loop from 1 to 10000 and perform only a small task (a mean(), for example) in each loop. Can someone confirm this?
_You_ can investigate it. I cannot determine from your statements what
expectations you have for an apply-vs-loop test, so I am not sure if
this is confirming or disproving:
z2 <- z <- vector("numeric", 10000)
x <- matrix(1:100, 10000,20)
aloop1 <- Sys.time(); z<-apply(x,1, mean); difftime( Sys.time(),
aloop1)
aloop2 <- Sys.time(); for (i in 1:10000) {z2[i] <- mean(x[i,]) } ;
difftime( Sys.time(), aloop2)
identical(z, z2)
Probably not in line with your current understanding. I wonder whether
the trivial advantage offered by apply (due to the single assignment I
suspect) is in line with you understanding. Most of the efficiency in
apply operations are at the level of clarity of the code and ease of
use. The maximal efficiency gains are to use the proper vectorized
operations that can be 50-100 times faster:
> aloop3 <- Sys.time(); z3 <- rowMeans(x) ; difftime( Sys.time(),
aloop3)
Time difference of 0.01409197 secs
> identical(z, z3)
[1] TRUE
Other efficincy strategies are to pre-allocate structures of known
size and avoid using c, cbind or rbind operatiosn to accumulate
results in a loop
David Winsemius, MD West Hartford, CT