Skip to content

Predict in glmnet for Cox family

3 messages · Terry Therneau, jitvis

#
On 04/21/2015 05:00 AM, r-help-request at r-project.org wrote:
The answer is that you cannot predict survival time, in general.  The reason is that most 
studies do not follow the subjects for a sufficiently long time.  For instance, say that 
the data set comes from a study that enrolled subjects and then followed them for up to 5 
years, at which time 35% had experienced mortality (using the usual Kaplan-Meier).  Fit a 
model to the data and ask "what is the predicted survival time for a low risk subject". 
The answer will at best be "greater than 5 years".   The program cannot say if it is 6 or 
10 or even 1000.  A bigger data set does not help.

Terry Therneau
#
Dear Terry,

Thank you for your reply, I understood its difficult to predict survival
time, in general. 

I have tried another approach and I would like to know whether my approach
is correct.

I have clustered my dataset based on some similarity and reduced the number
of variables using LASSO and some expert opinion. And then I applied
Accelerated failure time model - using weibull, used survival package -
survreg and then I predicted the survival time. 

The accuracy is little less due to the uncertainty and complexity in
survival time of individual observations, and I checked the quantile 5% and
95% and almost 95% observations falls in the confidence interval even if the
interval is little wide.

     Actual Predicted     Lower     Upper
1      91  83.01901 10.497993 178.65750
2      90  62.66257  7.923863 134.85030
3     115  57.59236  7.282720 123.93918
4      20  50.72860  6.414777 109.16830
5      81  83.42176 10.548922 179.52423
6     113  57.10106  7.220593 122.88188
7       8  58.29399  7.371442 125.44907
8      88  53.19866  6.727124 114.48390
9      17  34.80713  4.401461  74.90518
10      5  45.90169  5.804401  98.78076
11     20  58.99832  7.460507 126.96480
12     34  64.05572  8.100031 137.84837
13     27  39.25003  4.963279  84.46635
14     56  41.03611  5.189134  88.31000
15     60  69.70944  8.814959 150.01520

Is my approach correct ? Can I say this model is good ? 

Will I be able to some more testing so that I can get a probability survival
curve ?

Sincerely,




--
View this message in context: http://r.789695.n4.nabble.com/Predict-in-glmnet-for-cox-family-tp4706070p4706248.html
Sent from the R help mailing list archive at Nabble.com.
1 day later