Skip to content

predict.coxph and predict.survreg

8 messages · David Winsemius, Mattia Prosperi, James C. Whanger +1 more

#
On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:

            
The first step would be creating a Surv-object, followed by running a  
regression that created a coxph-object,  using dataset1 as input. So  
you should be looking at:

?Surv
?coxph

There are worked examples in the help pages. You would then run  
predict() on the coxph fit with "dataset2" as the newdata argument.  
The default output is the linear predictor for the log-hazard relative  
to a mean survival estimate but other sorts of estimates are possible.  
The survfit function provides survival curve suitable for plotting.

(You may want to inquire at a local medical school to find  
statisticians who have experience with this approach. This is ordinary  
biostatistics these days.)
#
Indeed, from the predict() function of the coxph you cannot get
directly "time" predictions, but only linear and exponential risk
scores. This is because, in order to get the time, a baseline hazard
has to be computed and it is not straightforward since it is implicit
in the Cox model.

2010/11/11 David Winsemius <dwinsemius at comcast.net>:
#
On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:

            
It is a prediction under specific assumptions underpinning a  
parametric estimate.
You ought to confirm that it "makes sense" by comparing to your data:
reauire(Hmisc); require(survival)
<your code>

 > describe(lung[lung$status==1&lung$ph.ecog==2,"time"])
lung[lung$status == 1 & lung$ph.ecog == 2, "time"]
       n missing  unique    Mean
       6       0       6   293.7

           92 105 211 292 511 551
Frequency  1   1   1   1   1   1
%         17  17  17  17  17  17

 > ?lung

So status==1 is a censored case and the observed times are status==2
 > describe(lung[lung$status==2&lung$ph.ecog==2,"time"])
lung[lung$status == 2 & lung$ph.ecog == 2, "time"]
       n missing  unique    Mean     .05     .10     .25     .50     . 
75     .90     .95
      44       1      44   226.0   14.95   36.90   94.50  178.50   
295.75  500.00  635.85

lowest :  11  12  13  26  30, highest: 524 533 654 707 814

And the mean time to death (in a group that had only 6 censored  
individual at times from 92 to 551)  was 226 and median time to death  
among 44 individuals is 178 with a right skewed distribution. You need  
to decide whether you want to make that particular prediction when you  
know that you forced a specific distributional form on the regression  
machinery by accepting the default.
David Winsemius, MD
West Hartford, CT