Skip to content
Prev 353718 / 398513 Next

Survival analysis and predict time-to-death

I read this list a day late as a digest so my answers are rarely the first.  (Which is 
nice as David W answers most of the survival questions for me!)

What you are asking is reasonable, and in fact is common practice in the realm of 
industrial reliability, e.g., Meeker and Escobar, Statistical Methods for Reliability 
Analysis.  Extrapolation of the survival curve to obtain the mean and percentiles of the 
lifetime distribution for some device (e.g. a washing machine) is their bread and butter, 
used for instance to determine the right size for an inventory of spare parts.  For most 
of us on this list who do medical statistics and live in the Kaplan-Meier/ Cox model world 
the ideas are uncommon.  I was lucky enough to sit through one of Bill Meeker's short 
courses and retain some (minimal) memory of it.

   1. You are correct that parametric models are essential.  If the extrapolation is 
substantial (30% or more censored, say), then the choice of distribution can be critical. 
  If failure is due to repeated insult, e.g., the multi-hit model, then Weibull tends to 
be preferred; if it is from degradation, e.g., flexing of a diaphram, then the log-normal. 
  Beyond this you need more guidance than mine.

   2. The survreg routine assumes that log(y) ~ covariates + error.  For a log-normal 
distribion the error is Gaussian and thus the predict(fit, type='response') will be 
exp(predicted mean of log time), which is not the predicted mean time.  For Weibull the 
error dist is asymmetric so things are more muddy.  Each is the MLE prediction for the 
subject, just not interpretable as a mean.  To get the actual mean you need to look up the 
formulas for Weibull and/or lognormal in a textbook, and map from the survreg 
parameterization to whatever one the textbook uses.  The two parameterizations are never 
the same.

   3. Another option is predicted quantiles.  ?predict.survreg shows how to get the entire 
survival curve.  The mean can be obtained as the area under the survival curve.  Relevant 
to your question, the expected time remaining for a subject still alive at time =10, say, 
is  integral(S(t), from 10 to infin) / S(10), where S is the survival curve.  You can also 
read off quantiles of the expected remaining life.

Terry Therneau
(author of the survival package)
On 08/18/2015 05:00 AM, r-help-request at r-project.org wrote: