very fast OLS regression?
On Wed, 25 Mar 2009, ivo welch wrote:
thanks, dimitris. I also added Bill Dunlap's "solve(qr(x),y)" function as ols5. here is what I get in terms of speed on a Mac Pro: ols1 6.779 3.591 10.37 0 0 ols2 0.515 0.21 0.725 0 0 ols3 0.576 0.403 0.971 0 0 ols4 1.143 1.251 2.395 0 0 ols5 0.683 0.565 1.248 0 0 so the naive matrix operations are fastest. I would have thought that alternatives to the naive stuff I learned in my linear algebra course would be quicker. still, ols3 and ols5 are competitive. the built-in lm() is really problematic. is ols3 (or perhaps even ols5) preferable in terms of accuracy? I think I can deal with 20% speed slow-down (but not with a factor 10 speed slow-down).
ols5 is more accurate in terms of round-off error, and so it is how the internal computations are done in lm and lsfit, but it is relatively unusual for this to matter given double precision.
If you want to repeat the regression with the same x and many y you can do much better by taking the QR decomposition just once and applying to a matrix of all the ys.
It's also possible that using chol() and backsolve() rather than solve() would speed up ols2 and ols3, at least for large enough matrices.
-thomas
regards, /iaw On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos <d.rizopoulos at erasmusmc.nl> wrote:
check the following options:
ols1 <- function (y, x) {
? ?coef(lm(y ~ x - 1))
}
ols2 <- function (y, x) {
? ?xy <- t(x)%*%y
? ?xxi <- solve(t(x)%*%x)
? ?b <- as.vector(xxi%*%xy)
? ?b
}
ols3 <- function (y, x) {
? ?XtX <- crossprod(x)
? ?Xty <- crossprod(x, y)
? ?solve(XtX, Xty)
}
ols4 <- function (y, x) {
? ?lm.fit(x, y)$coefficients
}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle