Survival analysis and predict time-to-death

Tue, Aug 18, 2015 6:19 AM

I read this list a day late as a digest so my answers are rarely the first. (Which is
nice as David W answers most of the survival questions for me!)

What you are asking is reasonable, and in fact is common practice in the realm of
industrial reliability, e.g., Meeker and Escobar, Statistical Methods for Reliability
Analysis. Extrapolation of the survival curve to obtain the mean and percentiles of the
lifetime distribution for some device (e.g. a washing machine) is their bread and butter,
used for instance to determine the right size for an inventory of spare parts. For most
of us on this list who do medical statistics and live in the Kaplan-Meier/ Cox model world
the ideas are uncommon. I was lucky enough to sit through one of Bill Meeker's short
courses and retain some (minimal) memory of it.

1. You are correct that parametric models are essential. If the extrapolation is
substantial (30% or more censored, say), then the choice of distribution can be critical.
If failure is due to repeated insult, e.g., the multi-hit model, then Weibull tends to
be preferred; if it is from degradation, e.g., flexing of a diaphram, then the log-normal.
Beyond this you need more guidance than mine.

2. The survreg routine assumes that log(y) ~ covariates + error. For a log-normal
distribion the error is Gaussian and thus the predict(fit, type='response') will be
exp(predicted mean of log time), which is not the predicted mean time. For Weibull the
error dist is asymmetric so things are more muddy. Each is the MLE prediction for the
subject, just not interpretable as a mean. To get the actual mean you need to look up the
formulas for Weibull and/or lognormal in a textbook, and map from the survreg
parameterization to whatever one the textbook uses. The two parameterizations are never
the same.

3. Another option is predicted quantiles. ?predict.survreg shows how to get the entire
survival curve. The mean can be obtained as the area under the survival curve. Relevant
to your question, the expected time remaining for a subject still alive at time =10, say,
is integral(S(t), from 10 to infin) / S(10), where S is the survival curve. You can also
read off quantiles of the expected remaining life.

Terry Therneau
(author of the survival package)

On 08/18/2015 05:00 AM, r-help-request at r-project.org wrote: