model.frame: how does one use it?
On 6/15/07, Dirk Eddelbuettel <edd at debian.org> wrote:
Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
In short, the issue has to do with how rpart evaluates a formula and
supporting arguments, in particular 'weights'.
A simple contrived example is
-----------------------------------------------------------------------------
library(rpart)
## using data from help(rpart), set up simple example
myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis
myweight <- abs(rnorm(nrow(mydata)))
goodFunction <- function(mydata, myformula, myweight) {
hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
prev <- hyp
}
goodFunction(mydata, myformula, myweight)
cat("Ok\n")
## now remove myweight and try to compute it inside a function
rm(myweight)
badFunction <- function(mydata, myformula) {
myweight <- abs(rnorm(nrow(mydata)))
mf <- model.frame(myformula, mydata, myweight)
print(head(df))
hyp <- rpart(myformula,
data=mf,
weights=myweight,
method="class")
prev <- hyp
}
badFunction(mydata, myformula)
cat("Done\n")
-----------------------------------------------------------------------------
Here goodFunction works, but only because myweight (with useless random
weights, but that is not the point here) is found from the calling
environment.
badFunction fails after we remove myweight from there:
:~> cat /tmp/philipp.R | R --slave
Ok
Error in eval(expr, envir, enclos) : object "myweight" not found
Execution halted
:~>
As I was able to replicate it, I reported this to the package maintainer. It
turns out that seemingly all is well as this is supposed to work this way,
and I got a friendly pointer to study model.frame and its help page.
Now I am stuck as I can't make sense of model.frame -- see badFunction
above. I would greatly appreciate any help in making rpart work with a local
argument weights so that I can tell Philipp that there is no bug. :)
I don't know if ?model.frame is the best place page to look. There's a more detailed description at http://developer.r-project.org/nonstandard-eval.pdf but here are the non-standard evaluation rules as I understand them: given a name in either (1) the formula or (2) ``special'' arguments like 'weights' in this case, or 'subset', try to find the name 1. in 'data' 2. failing that, in environment(formula) 3. failing that, in the enclosing environment, and so on. By 'name', I mean a symbol, such as 'Age' or 'myweight'. So basically, everything is as you would expect if the name is visible in data, but if not, the search starts in the environment of the formula, not the environment where the function call is being made (which is the standard evaulation behaviour). This is a feature, not a bug (things would be a lot more confusing if it were the other way round). With this in mind, either of the following might do what you want: badFunction <- function(mydata, myformula) { mydata$myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp } badFunction <- function(mydata, myformula) { myweight <- abs(rnorm(nrow(mydata))) environment(myformula) <- environment() hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp } -Deepayan