An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101111/8cc5a38b/attachment.pl>
predict.coxph and predict.survreg
8 messages · David Winsemius, Mattia Prosperi, James C. Whanger +1 more
On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
Dear all, I'm struggling with predicting "expected time until death" for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an "expected time until death" for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document "A Package for Survival Analysis in S" written by Terry M. Therneau but I have to admit that I'm a bit lost here.
The first step would be creating a Surv-object, followed by running a regression that created a coxph-object, using dataset1 as input. So you should be looking at: ?Surv ?coxph There are worked examples in the help pages. You would then run predict() on the coxph fit with "dataset2" as the newdata argument. The default output is the linear predictor for the log-hazard relative to a mean survival estimate but other sorts of estimates are possible. The survfit function provides survival curve suitable for plotting. (You may want to inquire at a local medical school to find statisticians who have experience with this approach. This is ordinary biostatistics these days.)
David. > > Could anyone give me some advice on how this could be done? > > Thanks very much in advance, > > Michael > > > > Michael Haenlein > Professor of Marketing > ESCP Europe > Paris, France David Winsemius, MD West Hartford, CT
Indeed, from the predict() function of the coxph you cannot get directly "time" predictions, but only linear and exponential risk scores. This is because, in order to get the time, a baseline hazard has to be computed and it is not straightforward since it is implicit in the Cox model. 2010/11/11 David Winsemius <dwinsemius at comcast.net>:
On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
Dear all, I'm struggling with predicting "expected time until death" for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an "expected time until death" for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document "A Package for Survival Analysis in S" written by Terry M. Therneau but I have to admit that I'm a bit lost here.
The first step would be creating a Surv-object, followed by running a regression that created a coxph-object, ?using dataset1 as input. So you should be looking at: ?Surv ?coxph There are worked examples in the help pages. You would then run predict() on the coxph fit with "dataset2" as the newdata argument. The default output is the linear predictor for the log-hazard relative to a mean survival estimate but other sorts of estimates are possible. The survfit function provides survival curve suitable for plotting. (You may want to inquire at a local medical school to find statisticians who have experience with this approach. This is ordinary biostatistics these days.) -- David.
Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101111/351b0e11/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101111/5bcc280b/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101111/b8902bb2/attachment.pl>
On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:
Thanks for the comment, James! The problem is that my initial sample (Dataset 1) is truncated. That means I only observe "time to death" for those individuals who actually died before end of my observation period. It is my understanding that this type of truncation creates a bias when I use a "normal" regression analysis. Hence my idea to use some form of survival model. I had another look at predict.survreg and I think the option "response" could work for me. When I run the following code I get ptime = 290.3648. I assume this means that an individual with ph.ecog=2 can be expected to life another 290.3648 days before death occurs [days is the time scale of the time variable).
It is a prediction under specific assumptions underpinning a parametric estimate.
Could someone confirm whether this makes sense?
You ought to confirm that it "makes sense" by comparing to your data:
reauire(Hmisc); require(survival)
<your code>
> describe(lung[lung$status==1&lung$ph.ecog==2,"time"])
lung[lung$status == 1 & lung$ph.ecog == 2, "time"]
n missing unique Mean
6 0 6 293.7
92 105 211 292 511 551
Frequency 1 1 1 1 1 1
% 17 17 17 17 17 17
> ?lung
So status==1 is a censored case and the observed times are status==2
> describe(lung[lung$status==2&lung$ph.ecog==2,"time"])
lung[lung$status == 2 & lung$ph.ecog == 2, "time"]
n missing unique Mean .05 .10 .25 .50 .
75 .90 .95
44 1 44 226.0 14.95 36.90 94.50 178.50
295.75 500.00 635.85
lowest : 11 12 13 26 30, highest: 524 533 654 707 814
And the mean time to death (in a group that had only 6 censored
individual at times from 92 to 551) was 226 and median time to death
among 44 individuals is 178 with a right skewed distribution. You need
to decide whether you want to make that particular prediction when you
know that you forced a specific distributional form on the regression
machinery by accepting the default.
lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung) ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response') On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger <james.whanger at gmail.com>wrote:
Michael, You are looking to compute an estimated time to death -- rather than the odds of death conditional upon time. Thus, you will want to use "time to death" as your dependent variable rather than a dichotomous outcome ( 0=alive, 1=death). You can accomplish this with a straight forward regression analysis. Best, Jim On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein <haenlein at escpeurope.eu
wrote:
Dear all, I'm struggling with predicting "expected time until death" for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an "expected time until death" for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document "A Package for Survival Analysis in S" written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing
David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101111/23181588/attachment.pl>