An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101027/134f57b3/attachment.pl>
coxph linear.predictors
7 messages · Bond, Stephen, David Winsemius
On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1]
age
11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards?
David Winsemius, MD West Hartford, CT
coef(fit)*model.matrix(fit)[1,1]
age
11.69021
I don't know what that might be and you are not telling us what you think it is.
I think this the calculation of the linear predictor, multiplying
(the beta*X)
I expected that coef(fit)*model.matrix(fit)[1,1]= fit$linear.predictors[1] And since it does not, I am lost. Ultimately I need unconditional death probability from day 1 to day t in the end. Survfit takes too much time as it does a lot of things I do not need. I hope I should be able to get the death prob for every subject from a single call to survfit and then rescale with exp(linear.predictor) for each subject. Just a comment: as there is not deltat argument, I think that coxph assumes deltat=1 so the hazard is the conditional probability over the next time interval ( t(i-1), t(i) ). With dense data sampled daily this is a day interval. Your comments are appreciated. Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 1:15 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors
On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1]
age
11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards?
David Winsemius, MD West Hartford, CT
On Oct 27, 2010, at 1:45 PM, Bond, Stephen wrote:
coef(fit)*model.matrix(fit)[1,1] age 11.69021
I don't know what that might be and you are not telling us what you think it is.
I think this the calculation of the linear predictor, multiplying
(the beta*X)
I expected that coef(fit)*model.matrix(fit)[1,1]= fit $linear.predictors[1] And since it does not, I am lost. Ultimately I need unconditional death probability from day 1 to day t in the end.
Now it sounds like you do want cap-lambda, the cumulative hazard. Let's call it Lambda Since S(t) = exp(-Lambda(t)), you can invert to get Lambda(t) = - log(S(t)) ?survfit Possibly: plot(-log( survfit(fit)$surv ))
Survfit takes too much time as it does a lot of things I do not need. I hope I should be able to get the death prob for every subject from a single call to survfit and then rescale with exp(linear.predictor) for each subject.
I have run into slowness with that approach myself. The rms package provides survest and Predict methods, but if you wanted to stay in survival, then why not create a survfit()$surv object based on a vector of means and then calculate an exp(newdata) by which you would multiply the S(mean|t)? There is also a predict.coxph function with its type="risk" parameter that you might want to check.
Just a comment: as there is not deltat argument, I think that coxph assumes deltat=1 so the hazard is the conditional probability over the next time interval ( t(i-1), t(i) ). With dense data sampled daily this is a day interval.
That's not the way I understand it. The Lambda() and S() functions only change at the particular event times on a continuous time so they are not assumed to have integer values. Pretty sure that Therneau avoids presenting extraction methods for instantaneous hazards from Cox models. In his book "Modeling Survival Data" he mostly sticks to discussing cumulative hazard functions rather than estimating an interval or instantaneous hazard. And fortunately he visits rhelp so it's quite possible that my errors in this area will get expert review and correction.
Your comments are appreciated. Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 1:15 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1] age 11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which
multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards? -- David Winsemius, MD West Hartford, CT
David Winsemius, MD West Hartford, CT
What you suggest is exactly what I intended. The big problem is that X*beta != linear.predictor # it works for glm That's a big loose end. The survival documentation on p41 implies that the above should be an equality, but it is not. And I could not produce the surv vector using a variety of formulas, so I guess this is the spirit of SAS disguising itself as an R package. Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 2:17 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors
On Oct 27, 2010, at 1:45 PM, Bond, Stephen wrote:
coef(fit)*model.matrix(fit)[1,1] age 11.69021
I don't know what that might be and you are not telling us what you think it is.
I think this the calculation of the linear predictor, multiplying
(the beta*X)
I expected that coef(fit)*model.matrix(fit)[1,1]= fit $linear.predictors[1] And since it does not, I am lost. Ultimately I need unconditional death probability from day 1 to day t in the end.
Now it sounds like you do want cap-lambda, the cumulative hazard. Let's call it Lambda Since S(t) = exp(-Lambda(t)), you can invert to get Lambda(t) = - log(S(t)) ?survfit Possibly: plot(-log( survfit(fit)$surv ))
Survfit takes too much time as it does a lot of things I do not need. I hope I should be able to get the death prob for every subject from a single call to survfit and then rescale with exp(linear.predictor) for each subject.
I have run into slowness with that approach myself. The rms package provides survest and Predict methods, but if you wanted to stay in survival, then why not create a survfit()$surv object based on a vector of means and then calculate an exp(newdata) by which you would multiply the S(mean|t)? There is also a predict.coxph function with its type="risk" parameter that you might want to check.
Just a comment: as there is not deltat argument, I think that coxph assumes deltat=1 so the hazard is the conditional probability over the next time interval ( t(i-1), t(i) ). With dense data sampled daily this is a day interval.
That's not the way I understand it. The Lambda() and S() functions only change at the particular event times on a continuous time so they are not assumed to have integer values. Pretty sure that Therneau avoids presenting extraction methods for instantaneous hazards from Cox models. In his book "Modeling Survival Data" he mostly sticks to discussing cumulative hazard functions rather than estimating an interval or instantaneous hazard. And fortunately he visits rhelp so it's quite possible that my errors in this area will get expert review and correction.
Your comments are appreciated. Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 1:15 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1] age 11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which
multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards? -- David Winsemius, MD West Hartford, CT
David Winsemius, MD West Hartford, CT
I think I found the first piece of it. Typical SAS:
fit$linear.predictors[1]-coef(fit)*(ovarian$age[1]-mean(ovarian$age))
age -4.4408920985e-16 Well misrepresented in several places... Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 1:15 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors
On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1]
age
11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards?
David Winsemius, MD West Hartford, CT
To close the issue:
S.ave <- survfit(fit) # this is average survival S.1 <- (S.ave$surv)^exp(fit$linear.predictors[1]) # get subject 1 S.1
[1] 0.848450223861993 0.696973203043377 0.530604282995790 0.391136296063732 0.228370373409774 0.132991279576665 0.071742127376216 [8] 0.030988671457923 0.012340713882587 0.004553665545169 0.000865566322530 0.000135746409544
survfit(fit,new=ovarian[1,])$surv # subject 1 from survfit
[1] 0.848450223861993 0.696973203043377 0.530604282995791 0.391136296063732 0.228370373409774 0.132991279576665 0.071742127376216 [8] 0.030988671457923 0.012340713882587 0.004553665545169 0.000865566322530 0.000135746409544 A single call to survfit will suffice for thousands of subjects, avoiding the iterative loop calculations inside survfit. Stephen Bond -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Wednesday, October 27, 2010 1:15 PM To: Bond, Stephen Cc: r-help at r-project.org Subject: Re: [R] coxph linear.predictors
On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
I would like to be able to construct hazard rates (or unconditional death prob)
Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)
for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756
That's the linear predictor (the beta*X) and that particular number only applies to the first case.
coef(fit)*model.matrix(fit)[1,1]
age
11.69021
I don't know what that might be and you are not telling us what you think it is.
The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which multiply
the columns of the model matrix. If the model is
over-determined there will be missing values in the vector
corresponding to the redundant columns in the model matrix.
Also, please comment whether n.event/n.risk
The Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))
gives the baseline hazard exp(alpha) ?
No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards?
David Winsemius, MD West Hartford, CT