standard format for newdata objects - R-devel

Wed, Apr 27, 2011 6:27 AM #

On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:

I agree with Peter.  There are two tasks in newdata: deciding what the
default reference levels should be, and building the data frame with
those levels.  It's the first part that is hard. For survival curves
from a Cox model the historical default has been to use the mean of each
covariate, which can be awful (sex coded as 0/1 leads to prediction for
a hermaphrodite?).  Nevertheless, I've not been able to think of a
strategy that would give sensible answers for most of the data I use and
coxph retains the flawed default for lack of a better idea.  When
teaching a class on this, I tell listeners "bite the bullet" and build
the newdata that makes clinical sense, because package defaults are
always unwise for some of the variables.  How can a package possibly
know that it should use bilirubin=1.0 (upper limit of normal) and AST =
45 when the data set is one of my liver transplant studies?
   Frank Harrell would argue that his "sometimes misguided" default in
cph is better than the "almost always wrong" one in coxph though, and
there is certainly some strength in that position.

Terry Therneau