Skip to content
Prev 274024 / 398506 Next

SLOW split() function

As another followup, given that you are doing numerous regression
models and (I presume) working with finance/stock data that is
strictly numeric (no need for special contrast coding, etc.), you can
substantially reduce the time spent estimating the coefficients.  A
simple way is to use lm.fit directly instead of lm.  For lm.fit, you
pass the y and x (design) matrices directly.  This skips a good deal
of overhead.  Here is one naive way, I imagine more speedups could be
gained by incorporating the intercept (1 vector) into d instead of
cbind()ing it.  The catch it that lm.fit requires matrices, not data
tables, so what you gain may be lost in having to do an extra
conversion.  In any case, here are the times on my system for the two
options (note I used N = 1000 * 100 because I am presently on a
glorified netbook).
+ x, data=d[.indx,])) })))
   user  system elapsed
  69.00    0.00   69.56
user  system elapsed
  37.83    0.03   38.36

the column names for the coeficients will not be the same as from lm,
but the estimates should be identical.  While this is not recommended
in typical usage, in an application like regressions on rolling time
windows, etc. where you know the data are not changing, I think it
makes sense to bypass the clever determine your data and best methods
to use, and go straight to passing the design matrix.  Since you do
not need residuals, variances, etc. it may be possible to speed this
up even more, perhaps bypassing dqrls altogether.

Cheers,

Josh
On Mon, Oct 10, 2011 at 9:56 PM, ivo welch <ivo.welch at gmail.com> wrote: