Python and R
Different methods of performing least squares calculations in R are discussed in
@Article{Rnews:Bates:2004,
author = {Douglas Bates},
title = {Least Squares Calculations in {R}},
journal = {R News},
year = 2004,
volume = 4,
number = 1,
pages = {17--20},
month = {June},
url = http,
pdf = Rnews2004-1
}
Some of the functions mentioned in that article have been modified. A
more up-to-date version of the comparisons in that article is
available as the Comparisons vignette in the Matrix package.
On Fri, Feb 20, 2009 at 6:06 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
Note that using solve can be numerically unstable for certain problems. On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
Decyphering formulas seems to be the most time consuming part of lm:
mylm1 <- function(formula, data) {
# not perfect but works
F <- model.frame(formula,data)
y <- model.response(F)
mt <- attr(F, "terms")
x <- model.matrix(mt,F)
coefs <- solve(crossprod(x), crossprod(x,y))
coefs
}
mylm2 <- function(x, y, intercept=TRUE) {
if(!is.matrix(x)) x <- as.matrix(x)
if(intercept) x <- cbind(1,x)
if(!is.matrix(y)) y <- as.matrix(y)
solve(crossprod(x), crossprod(x,y))
}
system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1], EuStockMarkets[,"DAX"]))
user system elapsed 6.43 0.00 6.53
system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
user system elapsed 16.19 0.00 16.23
system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
user system elapsed 21.43 0.00 21.44 So if you need to save time, I'd suggest something close to mylm2 rather than mylm1. Kenn On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
Thanks for the suggestions, I'll have to see if I can figure out how to convert the relatively simple call to lm with an equation and the data file to the functions you mention (or if that's even feasible).
X <- model.matrix(formula, data) will calculate the X matrix for you.
Not an expert in statistics myself, I am mostly concentrating on the programming aspects of R. Problem is that I suspect my colleagues who are providing some guidance with the stats end are not quite experts themselves, and certainly new to R. Cheers, Esmail Kenn Konstabel wrote:
lm does lots of computations, some of which you may never need. If speed really matters, you might want to compute only those things you will really use. If you only need coefficients, then using %*%, solve and crossprod will be remarkably faster than lm # repeating someone else's example # lm(DAX~., EuStockMarkets) y <- EuStockMarkets[,"DAX"] x <- EuStockMarkets x[,1]<-1 colnames(x)[1] <- "Intercept" lm(y ~ x-1) solve(crossprod(x), t(x))%*%y # probably this can be done more efficiently # and a naive timing
> system.time( for(i in 1:1000) lm(y ~ x-1))
user system elapsed 14.64 0.33 32.69
> system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
user system elapsed 0.36 0.00 0.36 Also lsfit() is a bit quicker than lm or lm.fit. Regards, Kenn
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.