Skip to content
Prev 307596 / 398506 Next

practical to loop over 2million rows?

Hi Jay,

A few comments.

1) As you know, vectorize when possible.  Even if you must have a
loop, perhaps you can avoid nested loops or at least speed each
iteration.
2) Write your loop in a function and then byte compile it using the
cmpfun() function from the compiler package.  This can help
dramatically (though still not to the extent of vectorization).
3) If you really need to speed up some aspect and are stuck with a
loop, checkout the R + Rcpp + inline + C++ tool chain, which allows
you to write inline C++ code, compile it fairly easily, and move data
to and from it.

Here is an example of a question I answered on SO where the OP had an
algorithm to implement in R and I ran through with the R implemention,
the compiled R implementation, and one using Rcpp and compare timings.
 It should give you a bit of a sense for what you are dealing with at
least.

You are correct that some things can help speed in R loops, such as
preallocation, and also depending what you are doing, some classes are
faster than others.  If you are working with a vector of integers,
don't store them as doubles in a data frame (that is a silly extreme,
but hopefully you get the point).

Good luck,

Josh
On Wed, Oct 10, 2012 at 1:31 PM, Jay Rice <jsrice18 at gmail.com> wrote: