Skip to content

data level for stepwise

2 messages · Constantine Frangakis, Thomas Lumley

#
This may be of interest to R users.

The command step () for stepwise regression, which asks for
an object like lm(formula, data=mydata), apparently is looking for
``mydata'' in the global environment, not the environment at which
step() is called. That is, when step is called
from inside another function in which the data that step() calls has also
been updated inside that function, step() does not use the most recently
updated data, but instead looks outside the function. (This problem does
not happen for the lm function). Although the problem can be solved by
using the assign function, to avoid potential bugs it would be useful to
know which functions like step() do this.

C Frangakis
#
On Tue, 3 Dec 2002, Constantine Frangakis wrote:

            
Not quite.  It's looking at the step() in the environment associated with
the model formula (which will typically be the environment where the model
was created, and often the base environment)
There is some discussion of this under "Nonstandard evaluation rules" on
http://developer.r-project.org, but it doesn't cover step(), which I'll
need to add.

You can work around this by using update() first: eg with
 data(trees)
 model<-lm(Volume~Height+Girth,data=trees)
 f<-function (i)
 {
    trees <- trees[-i, ]
    step(model)
 }
 g<-function (i)
 {
    trees <- trees[-i, ]
    model <- update(model)
    step(model)
 }

the argument to f() makes no difference, as the original `trees' data
frame is used, but the argument to g() is effective, as the local data
frame is used.


	-thomas