Skip to content

very fast OLS regression?

12 messages · Dimitris Rizopoulos, Ravi Varadhan, Bert Gunter +5 more

#
Dear R experts:

I just tried some simple test that told me that hand computing the OLS
coefficients is about 3-10 times as fast as using the built-in lm()
function.   (code included below.)  Most of the time, I do not care,
because I like the convenience, and I presume some of the time goes
into saving a lot of stuff that I may or may not need.  But when I do
want to learn the properties of an estimator whose input contains a
regression, I do care about speed.

What is the recommended fastest way to get regression coefficients in
R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
low-level C form somewhere?  that one was always lightning fast for
me.)

regards,

/ivo



bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));

byhand = function( y, x ) {
  xy<-t(x)%*%y;
  xxi<- solve(t(x)%*%x)
  b<-as.vector(xxi%*%xy)
  ## I will need these later, too:
  ## res<-y-as.vector(x%*%b)
  ## soa[i]<-b[2]
  ## sigmas[i]<-sd(res)
  b;
}


MC=500;
N=10000;


set.seed(0);
x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
y= matrix( rnorm(N*MC), nrow=N, ncol=MC );

ptm = proc.time()
for (mc in 1:MC) byhand(y[,mc],x[,mc]);
cat("By hand took ", proc.time()-ptm, "\n");

ptm = proc.time()
for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
cat("By built-in took ", proc.time()-ptm, "\n");
#
check the following options:

ols1 <- function (y, x) {
     coef(lm(y ~ x - 1))
}

ols2 <- function (y, x) {
     xy <- t(x)%*%y
     xxi <- solve(t(x)%*%x)
     b <- as.vector(xxi%*%xy)
     b
}

ols3 <- function (y, x) {
     XtX <- crossprod(x)
     Xty <- crossprod(x, y)
     solve(XtX, Xty)
}

ols4 <- function (y, x) {
     lm.fit(x, y)$coefficients
}

# check timings
MC <- 500
N <- 10000

set.seed(0)
x <- matrix(rnorm(N*MC), N, MC)
y <- matrix(rnorm(N*MC), N, MC)

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))


I hope it helps.

Best,
Dimitris
ivo welch wrote:

  
    
#
thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
#
Ivo,

I would be careful about dealing with situations where x may be numerically ill-conditioned.  Your code will do quite badly in such cases.  Here is an extreme illustration:

set.seed(123)
x <- matrix(rnorm(20), 10, 2)
x <- cbind(x, x[,1])   # x is clearly rank-deficient
y <- c(x %*% c(1,2,3))
Error in solve.default(t(x) %*% x) : 
  Lapack routine dgesv: system is exactly singular
A better approach would be to explicitly use "QR" decomposition in your byhand() function:

byhand.qr = function( y, x, tol=1.e-07 ) {
qr.x <- qr(x, tol=tol, LAPACK=TRUE)
b<-qr.coef(qr.x, y)
## I will need these later, too:
## res<- qr.res(qr.x, y)
## soa[i]<-b[2]
## sigmas[i]<-sd(res)
b;
}
[1] 1.795725 2.000000 2.204275
You can verify that this a "valid" solution:
[1]  9.436896e-16  2.498002e-16 -8.881784e-16  0.000000e+00 -4.440892e-16  1.776357e-15  4.440892e-16  0.000000e+00  4.440892e-16 -4.440892e-16
However, this is not the "unique" solution, since there an infinite number of solutions.  We can get "the" solution that has the minimum length by using Moore-Penrose inverse:
[1] 2 2 2
You can verify that ans.min has a smaller L2-norm than ans, and it is unique.

Hope this helps,
Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: ivo welch <ivowel at gmail.com>
Date: Wednesday, March 25, 2009 4:31 pm
Subject: [R] very fast OLS regression?
To: r-help <r-help at stat.math.ethz.ch>
#
lm is slow because it has to set up the design matrix (X) each time. See
?model.matrix and ?model.matrix.lm  for how to do this once separately from
lm (and then use lm.fit).

I am far from an expert on numerical algebra, but I'm pretty sure your
comments below are incorrect in the sense that "naive" methods are **not**
universally better then efficient numerical algebra methods using,say, QR
decompositions. It will depend very strongly on the size and specific nature
of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are,
after all, asymptotic. There is also typically a tradeoff between
computational accuracy (especially for ill-conditioned matrices)  and speed
which your remarks seem to neglect.

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of ivo welch
Sent: Wednesday, March 25, 2009 2:30 PM
To: Dimitris Rizopoulos
Cc: r-help
Subject: Re: [R] very fast OLS regression?

thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Yes, Bert.  Any least-squares solution that forms X'X and then inverts it is not to be recommended.  If X is nearly rank-deficient, then X'X will be more strongly so.  The QR decomposition approach in my byhand.qr() function is reliable and fast.

Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
Date: Wednesday, March 25, 2009 6:03 pm
Subject: Re: [R] very fast OLS regression?
To: 'ivo welch' <ivowel at gmail.com>, 'Dimitris Rizopoulos' <d.rizopoulos at erasmusmc.nl>
Cc: 'r-help' <r-help at stat.math.ethz.ch>
#
On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote:
No one has yet mentioned Doug Bates' article in R News on this topic,
which compares timings of various methods for least squares
computations.

Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June
2004.

A Cholesky decomposition solution proved fastest in base R code, with an
even faster version developed using sparse matrices and the Matrix
package.

You can find Doug's article here:

http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

HTH

G
#
On Wed, 25 Mar 2009, ivo welch wrote:

            
ols5 is more accurate in terms of round-off error, and so it is how the internal computations are done in lm and lsfit, but it is relatively unusual for this to matter given double precision.

If you want to repeat the regression with the same x and many y you can do much better by taking the QR decomposition just once and applying to a matrix of all the ys.

It's also possible that using chol() and backsolve() rather than solve() would speed up ols2 and ols3, at least for large enough matrices.

      -thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
On Wed, 25 Mar 2009, Ravi Varadhan wrote:

            
Forming the matrix of crossproducts and using cholesky decomposition is faster, so it does depend on the intended use.

In a simulation, the OP's situation, you may well know that X is not nearly rank deficient, in which case the speed advantage may be worthwhile.  After all, even if the condition number of X is 10^5 you will still have five or six accurate digits in the result.

If you are writing code that will be used in ways you have no control over, then of course it makes sense to use the more stable QR decomposition.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
On Wed, 2009-03-25 at 22:11 +0100, Dimitris Rizopoulos wrote:
Hi Dimitris and Ivo

I think this is not a fair comparison, look this

x[8,100]<-NA

system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))
   user  system elapsed 
  8.765   0.000   8.762 
 

 system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))
Error in solve.default(t(x) %*% x) : 
  system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.002 
 

 system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))
Error in solve.default(XtX, Xty) : 
  system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.001 
 

 system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))
Error in lm.fit(x, y) : NA/NaN/Inf in foreign function call (arg 1)
Timing stopped at: 0 0 0.001 

So routines ols2, ols3 and ols4 only functional in fully matrix if have
one NA this functions don't run.
2 days later
#
On Wed, Mar 25, 2009 at 5:15 PM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
An updated version is available as the "Comparisons" vignette in the
Matrix package.