I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:
fcall3 <- as.formula("time ~ age")
dfun3 <- function(dcall) {
fit <- lm(dcall, data=lung, model=FALSE)
model.frame(fit)
}
dfun3(fcall3)
The final call fails: it can't find 'dcall'.
The relevant code in model.frame.lm is:
env <- environment(formula$terms)
if (is.null(env))
env <- parent.frame()
eval(fcall, env, parent.frame())
If the environment of the formula is .Globalenv, as it is here, the
contents of parent.frame() are ignored. Adding a
print(ls(parent.frame()))
statement just above the final call shows that it isn't a scope issue:
the variables we want are there.
I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint). The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.
Terry Therneau
sessionInfo()
R version 2.13.0 RC (2011-04-12 r55424)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
PS. This also fails
dfun3 <- function(dcall) {
fit <- lm(dcall, data=lung)
model.frame(fit, subset=1:10)
}
You just need to force model.frame.lm to recreate data.
I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:
fcall3<- as.formula("time ~ age")
dfun3<- function(dcall) {
fit<- lm(dcall, data=lung, model=FALSE)
model.frame(fit)
}
dfun3(fcall3)
The final call fails: it can't find 'dcall'.
The relevant code in model.frame.lm is:
env<- environment(formula$terms)
if (is.null(env))
env<- parent.frame()
eval(fcall, env, parent.frame())
If the environment of the formula is .Globalenv, as it is here, the
contents of parent.frame() are ignored. Adding a
print(ls(parent.frame()))
statement just above the final call shows that it isn't a scope issue:
the variables we want are there.
I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint). The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.
The reason is that when a formula is created, the variables in it are
assumed to have meaning in that context. Where you work with the
formula after that should not be relevant: that's why formulas carry
environments with them. When you create the formula before the
variables, things go wrong.
There's probably a way to associate the lung dataframe with the formula,
or create the formula in such a way that things work, but I can't spot it.
Duncan Murdoch
Terry Therneau
sessionInfo()
R version 2.13.0 RC (2011-04-12 r55424)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
PS. This also fails
dfun3<- function(dcall) {
fit<- lm(dcall, data=lung)
model.frame(fit, subset=1:10)
}
You just need to force model.frame.lm to recreate data.
On Mon, Apr 18, 2011 at 5:51 PM, Terry Therneau <therneau at mayo.edu> wrote:
I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:
fcall3 <- as.formula("time ~ age")
dfun3 <- function(dcall) {
? ?fit <- lm(dcall, data=lung, model=FALSE)
? ?model.frame(fit)
}
dfun3(fcall3)
The final call fails: it can't find 'dcall'.
The relevant code in model.frame.lm is:
? ? ? env <- environment(formula$terms)
? ? ? if (is.null(env))
? ? ? ? ? ?env <- parent.frame()
? ? ? ?eval(fcall, env, parent.frame())
If the environment of the formula is .Globalenv, as it is here, the
contents of parent.frame() are ignored. ?Adding a
? ? ? ? ? print(ls(parent.frame()))
statement just above the ?final call shows that it isn't a scope issue:
the variables we want are there.
?I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint). ?The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.
Try using do.call. Using the built in BOD to illustrate, we first try
the posted code to view the error:
+ fit <- lm(dcall, data=BOD, model=FALSE)
+ model.frame(fit)
+ }
dfun3(fcall3)
Error in model.frame(formula = dcall, data = BOD, drop.unused.levels = TRUE) :
object 'dcall' not found
# now replace the lm call with a do.call("lm" ...)
# so that dcall gets substituted before the call to lm:
fcall3 <- as.formula("demand ~ Time")
dfun3 <- function(dcall) {
+ fit <- do.call("lm", list(dcall, data = BOD, model = FALSE))
+ model.frame(fit)
+ }
I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:
fcall3<- as.formula("time ~ age")
dfun3<- function(dcall) {
fit<- lm(dcall, data=lung, model=FALSE)
model.frame(fit)
}
dfun3(fcall3)
The final call fails: it can't find 'dcall'.
The relevant code in model.frame.lm is:
env<- environment(formula$terms)
if (is.null(env))
env<- parent.frame()
eval(fcall, env, parent.frame())
If the environment of the formula is .Globalenv, as it is here, the
contents of parent.frame() are ignored. Adding a
print(ls(parent.frame()))
statement just above the final call shows that it isn't a scope issue:
the variables we want are there.
I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint). The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.
The reason is that when a formula is created, the variables in it are assumed
to have meaning in that context. Where you work with the formula after that
should not be relevant: that's why formulas carry environments with them.
When you create the formula before the variables, things go wrong.
There's probably a way to associate the lung dataframe with the formula, or
create the formula in such a way that things work, but I can't spot it.
This is why model=FALSE is not the default. It avoids trying to find
the data at a later date (and even if you can solve the scoping
issues, the data may have been changed).
Duncan Murdoch
Terry Therneau
sessionInfo()
R version 2.13.0 RC (2011-04-12 r55424)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
PS. This also fails
dfun3<- function(dcall) {
fit<- lm(dcall, data=lung)
model.frame(fit, subset=1:10)
}
You just need to force model.frame.lm to recreate data.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
On Apr 19, 2011, at 07:16 , Prof Brian Ripley wrote:
On Mon, 18 Apr 2011, Duncan Murdoch wrote:
On 11-04-18 5:51 PM, Terry Therneau wrote:
I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:
fcall3<- as.formula("time ~ age")
dfun3<- function(dcall) {
fit<- lm(dcall, data=lung, model=FALSE)
model.frame(fit)
}
dfun3(fcall3)
[.....]
I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint). The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.
The reason is that when a formula is created, the variables in it are assumed to have meaning in that context. Where you work with the formula after that should not be relevant: that's why formulas carry environments with them. When you create the formula before the variables, things go wrong.
There's probably a way to associate the lung dataframe with the formula, or create the formula in such a way that things work, but I can't spot it.
This is why model=FALSE is not the default. It avoids trying to find the data at a later date (and even if you can solve the scoping issues, the data may have been changed).
Yes, but there are other cases where a reevaluation is triggered. The example I found earlier involved doing model.frame on a subset, in which case the length(nargs) clause in model.frame.lm gets chosen.
So something is not right: Either we should arrange that reevaluations are never necessary, or we there should be a mechanism to get them reevaluated in the same scope as the original call.
An obvious way would be to add the evaluation environment as an attribute to the $call component, but what would the memory management and serialization consequences be?
One workaround is, as Gabor points out, effectively to substitute the value of the arguments to lm() at the point of the call, using do.call(lm, list(.....)) or some eval(substitute(.....)) construct to the same effect. However, the result of do.call() will look awkward in the cases where the $call gets deparsed, though. E.g. in Gabor's example, if we modify it to show the actual fit, we get the result below (I'm sure you can imagine what would happen if a data frame with more than 7 rows got used!). On the other hand, NOT substituting such arguments leaves the scoping issues.
Another possible workaround is to make sure that functions that call modelling code internally will do the evaluation in the frame of the caller (like the call to model.matrix inside lm does). However, that seems to defeat the purpose of adding environments to formulas in the first place.
-pd
dfun3 <- function(dcall) {
+ fit <- do.call("lm", list(dcall, data = BOD, model = FALSE))
+ print(model.frame(fit))
+ fit}
dfun3(fcall3)
demand Time
1 8.3 1
2 10.3 2
3 19.0 3
4 16.0 4
5 15.6 5
6 19.8 7
Call:
lm(formula = demand ~ Time, data = structure(list(Time = c(1,
2, 3, 4, 5, 7), demand = c(8.3, 10.3, 19, 16, 15.6, 19.8)), .Names = c("Time",
"demand"), row.names = c(NA, -6L), class = "data.frame", reference = "A1.4, p. 270"),
model = FALSE)
Coefficients:
(Intercept) Time
8.521 1.721
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com