stats lm() function

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090312/ab650eb4/attachment-0002.pl>
yes, indeed, you can certainly speed things up, by just changing the 
design matrix X and feeding it back to lm.fit().

In addition, if you just need the least squares estimates, then you gain 
a bit more by using constructs of the form:

XtX <- crossprod(X)
Xty <- crossprod(X, y)
betas <- solve(XtX, Xty)

I hope it helps.

Best,
Dimitris
Hi, 

Im using the lm() function where the formula is quite big (300 arguments) and the data is a frame of 3000 values. 

This is running in a loop where in each step the formula is reduced by one argument, and the lm command is called again (to check which arguments are useful) . 

This takes 1-2 minutes. 
Is there a way to speed this up? 
i checked the code of the lm function and its seems that its preparing the data and then calls lm.Fit(). i thought about just doing this praparing stuff first and only call lm.fit() 300 times. 
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
I think you will find that many readers of this list would rather try  
to dissuade you from this misguided strategy. You are unlikely to get  
to a sensible solution in using step-down procedures with this sort of  
situation (large number of predictors with modest size of data).
David Winsemius

On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:

> Hi,
>
> Im using the lm() function where the formula is quite big (300  
> arguments) and the data is a frame of 3000 values.
>
> This is running in a loop where in each step the formula is reduced  
> by one argument, and the lm command is called again (to check which  
> arguments are useful) .
>
> This takes 1-2 minutes.
> Is there a way to speed this up?
> i checked the code of the lm function and its seems that its  
> preparing the data and then calls lm.Fit(). i thought about just  
> doing this praparing stuff first and only call lm.fit() 300 times.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
ok,
i think i have to be more precise of what we are doing.
first thing: this code is not from me, and Im new to R (and never touched 
anything like this)
Im just the lucky guy who has to maintain this crap :)
this call to the lm function is part of a code wich is used to predict the 
marketvalues from a bunch of our products.
as 'target' function it gets the past marketvalues we have in our 
database.(this is what goes into the 'data' parameter into the lm function)

then we have allot other prices and enviromental data (like similar 
products, stock sizes, seasonal informations, .... )
with this, the big formula is created (y ~ x1 + x2 + x3 + x4 + x5 ....... + 
x300)

all this goes into the lm call. then the result is somehow anaylsed to 
figure out wich input data-set had the least influence (or similaryti ) to 
the past marketvalues. this one gets eleminated and lm is called again 
wihout this data-set.
this is done until we just have a small number of datasets left.

could be that everything im writing here is totaly bullshit (cause im not 
shure if i got every thing right)
but this thing is working an creates very nice predictions ;)

i just fugured that the lm call's in this loop tooks the most time and i 
want to reduce this.
any ideas?

----- Original Message ----- 
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Paul Hermes" <paul.hermes at analytic-company.com>
Cc: <r-help at r-project.org>
Sent: Thursday, March 12, 2009 3:42 PM
Subject: Re: [R] stats lm() function
I think you will find that many readers of this list would rather try  to 
dissuade you from this misguided strategy. You are unlikely to get  to a 
sensible solution in using step-down procedures with this sort of 
situation (large number of predictors with modest size of data).

-- 
David Winsemius

On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:

Hi,

Im using the lm() function where the formula is quite big (300 
arguments) and the data is a frame of 3000 values.

This is running in a loop where in each step the formula is reduced  by 
one argument, and the lm command is called again (to check which 
arguments are useful) .

This takes 1-2 minutes.
Is there a way to speed this up?
i checked the code of the lm function and its seems that its  preparing 
the data and then calls lm.Fit(). i thought about just  doing this 
praparing stuff first and only call lm.fit() 300 times.
[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Altough it depends on what crit you keep your variables, but maybe you should
take a look at ?step.

Bart
ok,
i think i have to be more precise of what we are doing.
first thing: this code is not from me, and Im new to R (and never touched 
anything like this)
Im just the lucky guy who has to maintain this crap :)
this call to the lm function is part of a code wich is used to predict the 
marketvalues from a bunch of our products.
as 'target' function it gets the past marketvalues we have in our 
database.(this is what goes into the 'data' parameter into the lm
function)

then we have allot other prices and enviromental data (like similar 
products, stock sizes, seasonal informations, .... )
with this, the big formula is created (y ~ x1 + x2 + x3 + x4 + x5 .......
+ 
x300)

all this goes into the lm call. then the result is somehow anaylsed to 
figure out wich input data-set had the least influence (or similaryti ) to 
the past marketvalues. this one gets eleminated and lm is called again 
wihout this data-set.
this is done until we just have a small number of datasets left.

could be that everything im writing here is totaly bullshit (cause im not 
shure if i got every thing right)
but this thing is working an creates very nice predictions ;)

i just fugured that the lm call's in this loop tooks the most time and i 
want to reduce this.
any ideas?

----- Original Message ----- 
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Paul Hermes" <paul.hermes at analytic-company.com>
Cc: <r-help at r-project.org>
Sent: Thursday, March 12, 2009 3:42 PM
Subject: Re: [R] stats lm() function

I think you will find that many readers of this list would rather try  to 
dissuade you from this misguided strategy. You are unlikely to get  to a 
sensible solution in using step-down procedures with this sort of 
situation (large number of predictors with modest size of data).

-- 
David Winsemius

On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:

Hi,

Im using the lm() function where the formula is quite big (300 
arguments) and the data is a frame of 3000 values.

This is running in a loop where in each step the formula is reduced  by 
one argument, and the lm command is called again (to check which 
arguments are useful) .

This takes 1-2 minutes.
Is there a way to speed this up?
i checked the code of the lm function and its seems that its  preparing 
the data and then calls lm.Fit(). i thought about just  doing this 
praparing stuff first and only call lm.fit() 300 times.
[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/stats-lm%28%29-function-tp22483608p22492199.html
Sent from the R help mailing list archive at Nabble.com.