R version: 1.7.1
OS: Red Hat Linux 7.2
Hi all,
The formula object in model.frame() is not retrieved properly when
model.frame() is called from within a function and the "subset" argument
is supplied.
foo <- function(formula,data,subset=NULL)
{
cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n")
print(formula[-3] == ~y)
cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n")
print(try(model.frame(formula[-3],data=data,subset=subset)))
cat("\n*****Result of model.frame() using ~y**** WORKS *****\n")
print(try(model.frame(~y,data=data,subset=subset)))
}
dat <- data.frame(y=c(5,25))
foo(y~1,dat)
Curiously, if the "subset" argument is removed from the call to
model.frame(), then the execution is successful in both cases.
In ?model.frame, one can read:
Variables in the formula, `subset' and in `...' are looked for
first in `data' and then in the environment of `formula': see the
help for `formula()' for further details.
However, replacing the line
subset <- eval(substitute(subset), data, env)
by
subset <- eval(substitute(subset), data, environment())
in model.frame.default() fixes this problem. I don't know if this
correction would create more problems in other cases. Perhaps there is a
better fix.
Sincerely,
Jerome Asselin
R version: 1.7.1
OS: Red Hat Linux 7.2
Hi all,
The formula object in model.frame() is not retrieved properly when
model.frame() is called from within a function and the "subset" argument
is supplied.
foo <- function(formula,data,subset=NULL)
{
cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n")
print(formula[-3] == ~y)
cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n")
print(try(model.frame(formula[-3],data=data,subset=subset)))
cat("\n*****Result of model.frame() using ~y**** WORKS *****\n")
print(try(model.frame(~y,data=data,subset=subset)))
}
dat <- data.frame(y=c(5,25))
foo(y~1,dat)
Curiously, if the "subset" argument is removed from the call to
model.frame(), then the execution is successful in both cases.
In ?model.frame, one can read:
Variables in the formula, `subset' and in `...' are looked for
first in `data' and then in the environment of `formula': see the
help for `formula()' for further details.
However, replacing the line
subset <- eval(substitute(subset), data, env)
by
subset <- eval(substitute(subset), data, environment())
in model.frame.default() fixes this problem. I don't know if this
correction would create more problems in other cases. Perhaps there is a
better fix.
There is really nothing to fix, at least if you go by the rule that it
is only a bug if it behaves contrary to documentation:
There is no "subset" in the environment of "formula", nor in the
"data". If you put one there, the error goes away
subset<-NULL
foo(y~1,dat,subset=1)
*****Does formula[-3] == ~y ?**** TRUE *****
[1] TRUE
*****Result of model.frame() using formula[-3]**** FAIL *****
y
1 5
2 25
*****Result of model.frame() using ~y**** WORKS *****
y
1 5
However, notice that it is not the same subset.
There's a whole area of similar nastiness grouped under the heading of
"nonstandard evaluation rules". The basic issue is that you will often
assume that the variables used for subsetting comes from the same
place as those in the model, e.g. in lm(fat~age,subset=sex=="male").
The problem is that it gets really awkward when a function wants to
compute the subset variable and combine it with a formula passed as an
argument. And it only gets worse when arguments can be both scalar and
vector, e.g.
plot(fat~age, col=as.numeric(sex))
function(mycolor="green") plot(fat~age, col=mycolor)
We have discussed changing this on several occasions, e.g. by
requiring that arguments that need to be evaluated in the formula
environment or the data frame should be either model formulas
themselves or quoted expressions. However, that would break S-PLUS
compatibility and also a large body of existing analysis code.
[[ I did discover yesterday (or maybe I was just reminded...) that we
even have nonstandard nonstandard evaluation rules in some places
(nls() seems to evaluate its model formula in the global environment
even if it is given explicitly within a function:
f <- function() {
g <- function(a,x) exp(-a*x)
nls(y~g(a,x),start=list(a=.1))
}
x <- 1:10
y <- exp(-.12*x)+rnorm(10,sd=.001)
f()
Error in eval(expr, envir, enclos) : couldn't find function "g"
Argh...]]
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
On Thursday, Aug 7, 2003, at 04:13 US/Eastern, Peter Dalgaard BSA wrote:
[[ I did discover yesterday (or maybe I was just reminded...) that we
even have nonstandard nonstandard evaluation rules in some places
(nls() seems to evaluate its model formula in the global environment
even if it is given explicitly within a function:
f <- function() {
g <- function(a,x) exp(-a*x)
nls(y~g(a,x),start=list(a=.1))
}
x <- 1:10
y <- exp(-.12*x)+rnorm(10,sd=.001)
f()
Error in eval(expr, envir, enclos) : couldn't find function "g"
Argh...]]
Given that I am, for better or worse, responsible for a large portion
of the code in nls, I should make it clear that I did not understand
the nonstandard evaluation rule in those days and so any nonstandrad
nonstandrad rule used there is a bug. Now that I understand these
things a little, I can see that nls does a few things wrong. I think
the following patch mostly fixes them.
-------------- next part --------------
[[ I did discover yesterday (or maybe I was just reminded...) that we
even have nonstandard nonstandard evaluation rules in some places
(nls() seems to evaluate its model formula in the global environment
even if it is given explicitly within a function:
f <- function() {
g <- function(a,x) exp(-a*x)
nls(y~g(a,x),start=list(a=.1))
}
x <- 1:10
y <- exp(-.12*x)+rnorm(10,sd=.001)
f()
Error in eval(expr, envir, enclos) : couldn't find function "g"
Argh...]]
This is the same phenomenon that is documented for lattice graphics and
for lme in my notes on nonstandard evaluation rules. I think it *is* a
bug.
-thomas
On Thursday, Aug 7, 2003, at 10:14 US/Eastern, Thomas Lumley wrote:
On 7 Aug 2003, Peter Dalgaard BSA wrote:
[[ I did discover yesterday (or maybe I was just reminded...) that we
even have nonstandard nonstandard evaluation rules in some places
(nls() seems to evaluate its model formula in the global environment
even if it is given explicitly within a function:
f <- function() {
g <- function(a,x) exp(-a*x)
nls(y~g(a,x),start=list(a=.1))
}
x <- 1:10
y <- exp(-.12*x)+rnorm(10,sd=.001)
f()
Error in eval(expr, envir, enclos) : couldn't find function "g"
Argh...]]
This is the same phenomenon that is documented for lattice graphics and
for lme in my notes on nonstandard evaluation rules. I think it *is* a
bug.
I think this was fixed in lattice a few months ago - from the
ChangeLog, on March 3rd.
Thanks for your reply and discussion on the issue. See below for another
suggestion of a fix.
I have spent some time trying to find a fix which would still work as
documented:
Variables in the formula, `subset' and in `...' are looked for
first in `data' and then in the environment of `formula': see the
help for `formula()' for further details.
The problem is that the expression environment(formula) in
model.frame.default() gives the value:
(1) <environment: R_GlobalEnv> for the call
model.frame(formula[-3],data=data,subset=subset) ;
(2) <environment: 0x883d288> (or something alike) for the call
model.frame(~y,data=data,subset=subset) .
In case (1), eval(subset, data, env) in model.frame.default() gives the
subset() function which leads to an error.
In the case (2), it gives the correct value for subset (i.e., NULL in the
example of my original message).
I wonder why the environment is not the same for both cases. Don't you?
Perhaps this is where the real problem is, but my current understanding of
environment() is too limited to make such a claim.
I suggest here another fix which I hope respects the documentation. In
model.frame.default(), add the line
formula <- formula(deparse(formula))
just before the line
env <- environment(formula)
This change will affect the value of environment(formula).
If you make the correction and run the code below, then it should work
successfully. The question is whether this change still respects the
documentation. Personally, I think this is safe, because the expression
eval(subset, data, env) is still evaluated in the environment of
`formula', despite the fact that this environment has changed.
Sincerely,
Jerome Asselin
foo <- function(formula,data,subset=NULL)
{
cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n")
print(formula[-3] == ~y)
cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n")
print(try(model.frame(formula[-3],data=data,subset=subset)))
cat("\n*****Result of model.frame() using ~y**** WORKS *****\n")
print(try(model.frame(~y,data=data,subset=subset)))
}
dat <- data.frame(y=c(5,25))
foo(y~1,dat)
foo(y~1,dat,subset=1)
####Results after making the correction###
foo(y~1,dat)
*****Does formula[-3] == ~y ?**** TRUE *****
[1] TRUE
*****Result of model.frame() using formula[-3]**** FAIL *****
y
1 5
2 25
*****Result of model.frame() using ~y**** WORKS *****
y
1 5
2 25
foo(y~1,dat,subset=1)
*****Does formula[-3] == ~y ?**** TRUE *****
[1] TRUE
*****Result of model.frame() using formula[-3]**** FAIL *****
y
1 5
*****Result of model.frame() using ~y**** WORKS *****
y
1 5