Brian,
So let me ask an r-core opinion. Should I change the default to model=TRUE? Survival
is heavily used and there is something to be said for consistency within the central
packages. Sometimes old habits die hard, and there is a "save memory" part of me that
hates to save a large object that likely won't be used. Not nearly as relevant today as
when I started my career.
I agree that the biggest issue with model=FALSE is when someone asks for predictions
from a saved fit, perhaps saved weeks ago, and the data has changed under our feet. I
have a check in predict.coxph that the number of rows in the data hasn't changed, but
there really is no defense.
Aside: This would mean in theory that I could also change the default to y= FALSE. I
discovered a few years ago that that won't fly, when I set the default for y to "not
model"; why keep redundant copies? Several other packages that depend on survival failed.
They assume fit$y is there, without checking. The iron chains of backwards compatability...
1. The key line, in both model.frame.coxph and model.frame.lm is
eval(fcall, env, parent.frame())
and it appear (at least to me) that the parent.frame() part of this is
effectively ignored when fcall is itself a reference to model.frame.
I'd like to understand this better.
Way back (ca R 1.2.0) an advocate of lexical scoping changed model.frame.lm to refer to an environment not a data frame for 'env'. That pretty fundamental change means that your sort of example is not a recommended way to do this: you are mixing scoping models.
This hasn't left me any wiser. Can you expand? As stated in another note the real issue was
fit <- coxph(formula, data=nd)
predict(fit, type="expected")
within a user's function. They, not unreasonably, expected it to work without further
trickery. It fails because the model.frame call within predict.coxph cannot find "nd".
2. The modeling functions coxph and survreg in the survival default to model=FALSE, originally in mimicry of lm and glm; I don't know when R
> changed the default to model=TRUE for lm and glm. One possible response
I am not sure R ever did: model = TRUE was the default 16 years ago at the beginning of the CVS/SVN archive.
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595