An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090312/ab650eb4/attachment-0002.pl>
stats lm() function
5 messages · Paul Hermes, Dimitris Rizopoulos, David Winsemius +1 more
yes, indeed, you can certainly speed things up, by just changing the design matrix X and feeding it back to lm.fit(). In addition, if you just need the least squares estimates, then you gain a bit more by using constructs of the form: XtX <- crossprod(X) Xty <- crossprod(X, y) betas <- solve(XtX, Xty) I hope it helps. Best, Dimitris
Paul Hermes wrote:
Hi, Im using the lm() function where the formula is quite big (300 arguments) and the data is a frame of 3000 values. This is running in a loop where in each step the formula is reduced by one argument, and the lm command is called again (to check which arguments are useful) . This takes 1-2 minutes. Is there a way to speed this up? i checked the code of the lm function and its seems that its preparing the data and then calls lm.Fit(). i thought about just doing this praparing stuff first and only call lm.fit() 300 times. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
I think you will find that many readers of this list would rather try to dissuade you from this misguided strategy. You are unlikely to get to a sensible solution in using step-down procedures with this sort of situation (large number of predictors with modest size of data).
David Winsemius On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote: > Hi, > > Im using the lm() function where the formula is quite big (300 > arguments) and the data is a frame of 3000 values. > > This is running in a loop where in each step the formula is reduced > by one argument, and the lm command is called again (to check which > arguments are useful) . > > This takes 1-2 minutes. > Is there a way to speed this up? > i checked the code of the lm function and its seems that its > preparing the data and then calls lm.Fit(). i thought about just > doing this praparing stuff first and only call lm.fit() 300 times. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT
ok, i think i have to be more precise of what we are doing. first thing: this code is not from me, and Im new to R (and never touched anything like this) Im just the lucky guy who has to maintain this crap :) this call to the lm function is part of a code wich is used to predict the marketvalues from a bunch of our products. as 'target' function it gets the past marketvalues we have in our database.(this is what goes into the 'data' parameter into the lm function) then we have allot other prices and enviromental data (like similar products, stock sizes, seasonal informations, .... ) with this, the big formula is created (y ~ x1 + x2 + x3 + x4 + x5 ....... + x300) all this goes into the lm call. then the result is somehow anaylsed to figure out wich input data-set had the least influence (or similaryti ) to the past marketvalues. this one gets eleminated and lm is called again wihout this data-set. this is done until we just have a small number of datasets left. could be that everything im writing here is totaly bullshit (cause im not shure if i got every thing right) but this thing is working an creates very nice predictions ;) i just fugured that the lm call's in this loop tooks the most time and i want to reduce this. any ideas? ----- Original Message ----- From: "David Winsemius" <dwinsemius at comcast.net> To: "Paul Hermes" <paul.hermes at analytic-company.com> Cc: <r-help at r-project.org> Sent: Thursday, March 12, 2009 3:42 PM Subject: Re: [R] stats lm() function
I think you will find that many readers of this list would rather try to dissuade you from this misguided strategy. You are unlikely to get to a sensible solution in using step-down procedures with this sort of situation (large number of predictors with modest size of data). -- David Winsemius On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:
Hi, Im using the lm() function where the formula is quite big (300 arguments) and the data is a frame of 3000 values. This is running in a loop where in each step the formula is reduced by one argument, and the lm command is called again (to check which arguments are useful) . This takes 1-2 minutes. Is there a way to speed this up? i checked the code of the lm function and its seems that its preparing the data and then calls lm.Fit(). i thought about just doing this praparing stuff first and only call lm.fit() 300 times. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
Altough it depends on what crit you keep your variables, but maybe you should take a look at ?step. Bart
Paul Hermes wrote:
ok, i think i have to be more precise of what we are doing. first thing: this code is not from me, and Im new to R (and never touched anything like this) Im just the lucky guy who has to maintain this crap :) this call to the lm function is part of a code wich is used to predict the marketvalues from a bunch of our products. as 'target' function it gets the past marketvalues we have in our database.(this is what goes into the 'data' parameter into the lm function) then we have allot other prices and enviromental data (like similar products, stock sizes, seasonal informations, .... ) with this, the big formula is created (y ~ x1 + x2 + x3 + x4 + x5 ....... + x300) all this goes into the lm call. then the result is somehow anaylsed to figure out wich input data-set had the least influence (or similaryti ) to the past marketvalues. this one gets eleminated and lm is called again wihout this data-set. this is done until we just have a small number of datasets left. could be that everything im writing here is totaly bullshit (cause im not shure if i got every thing right) but this thing is working an creates very nice predictions ;) i just fugured that the lm call's in this loop tooks the most time and i want to reduce this. any ideas? ----- Original Message ----- From: "David Winsemius" <dwinsemius at comcast.net> To: "Paul Hermes" <paul.hermes at analytic-company.com> Cc: <r-help at r-project.org> Sent: Thursday, March 12, 2009 3:42 PM Subject: Re: [R] stats lm() function
I think you will find that many readers of this list would rather try to dissuade you from this misguided strategy. You are unlikely to get to a sensible solution in using step-down procedures with this sort of situation (large number of predictors with modest size of data). -- David Winsemius On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:
Hi, Im using the lm() function where the formula is quite big (300 arguments) and the data is a frame of 3000 values. This is running in a loop where in each step the formula is reduced by one argument, and the lm command is called again (to check which arguments are useful) . This takes 1-2 minutes. Is there a way to speed this up? i checked the code of the lm function and its seems that its preparing the data and then calls lm.Fit(). i thought about just doing this praparing stuff first and only call lm.fit() 300 times. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/stats-lm%28%29-function-tp22483608p22492199.html Sent from the R help mailing list archive at Nabble.com.