This may be of interest to R users. The command step () for stepwise regression, which asks for an object like lm(formula, data=mydata), apparently is looking for ``mydata'' in the global environment, not the environment at which step() is called. That is, when step is called from inside another function in which the data that step() calls has also been updated inside that function, step() does not use the most recently updated data, but instead looks outside the function. (This problem does not happen for the lm function). Although the problem can be solved by using the assign function, to avoid potential bugs it would be useful to know which functions like step() do this. C Frangakis
data level for stepwise
2 messages · Constantine Frangakis, Thomas Lumley
On Tue, 3 Dec 2002, Constantine Frangakis wrote:
This may be of interest to R users. The command step () for stepwise regression, which asks for an object like lm(formula, data=mydata), apparently is looking for ``mydata'' in the global environment, not the environment at which step() is called.
Not quite. It's looking at the step() in the environment associated with the model formula (which will typically be the environment where the model was created, and often the base environment)
That is, when step is called from inside another function in which the data that step() calls has also been updated inside that function, step() does not use the most recently updated data, but instead looks outside the function. (This problem does not happen for the lm function). Although the problem can be solved by using the assign function, to avoid potential bugs it would be useful to know which functions like step() do this.
There is some discussion of this under "Nonstandard evaluation rules" on http://developer.r-project.org, but it doesn't cover step(), which I'll need to add. You can work around this by using update() first: eg with data(trees) model<-lm(Volume~Height+Girth,data=trees) f<-function (i) { trees <- trees[-i, ] step(model) } g<-function (i) { trees <- trees[-i, ] model <- update(model) step(model) } the argument to f() makes no difference, as the original `trees' data frame is used, but the argument to g() is effective, as the local data frame is used. -thomas