Skip to content
Prev 267014 / 398502 Next

finding a faster way to run lm on rows of predictor matrix

Hi,

If you are doing repeated fits in this specialized case, you can
benefit by skipping some of the fancy overhead and going straight to
lm.fit.

## what you had
data.residuals <- t(apply(predictors, 1, function(x)( lm(regress.y ~
-1 + as.vector(x))$residuals)))

## passing the matrices directly to lm.fit (no need to augment the X
matrix because you do not want the intercept anyway)
data.residuals2 <- t(sapply(1:6000, function(i) lm.fit(x =
t(predictors[i, , drop = FALSE]), y = regress.y)$residuals))

On my little laptop:
user  system elapsed
  42.84    0.11   45.01
user  system elapsed
   3.76    0.00    3.82

If those timing ratios hold on your system, you should be looking at
around 1.2 seconds per run.  Which suggests it will take around 2
hours to complete 5,000 runs.  Note that this is the sort of task that
can be readily parallelized, so if you are on a multicore machine, you
could take advantage of that without too much trouble.

Cheers,

Josh
On Fri, Jul 29, 2011 at 8:30 AM, <cypark at princeton.edu> wrote: