Skip to content

Prediction in Cox Proportional-Hazard Regression

5 messages · Giuseppe.Palermo@bo.infn.it, Brian Ripley, Thomas Lumley

#
He,
I used the "coxph" function, with four covariates.

Let's say something like that
So I obtain the 4 coefficients B1,B2,B3,B4 such that

h(t) = h0(t) exp(B1*X1+ B2*X2 + B3*X3 + B4*X4).

When I use the function on the same data
how it works in making the prediction?
I mean which is the formula, given the data-point P1=[X1(1),X2(1),X3(1),X4(1)],
that the function "predict.coxph" use to make the prediction of P1.

I really hope that someone will reply to my question.

Best regards to all
Giuseppe
#
On Thu, 9 Jun 2005 Giuseppe.Palermo at bo.infn.it wrote:

            
How does that work?  predict.coxph is not an exported function!
if (type == "lp" || type == "risk") {
         if (missing(newdata)) {
             pred <- object$linear.predictors
             names(pred) <- names(object$residuals)
         }
         else pred <- x %*% coef + offset
...

so that is the formula it uses.  As you did not supply 'newdata', it 
quotes the 'linear.predictors' component of the fit: see ?coxph.object.

Effectively it centred the explanatory variables on their means and then 
applied the linear regression formula to give the linear predictor. It is 
the centring that may be non-obvious: effectively h_0(t), the baseline 
hazard, is taken at the average of the subjects.
#
Quoting Prof Brian Ripley <ripley at stats.ox.ac.uk>:
Dear Prof. Ripley
Thanks for replying to me email.
I only have an other question:

since h(t) = h0(t) exp(B1*X1+ B2*X2 + B3*X3 + B4*X4)
represent the hazard at time t.

In a linear prediction,
what     Value = B1*(X1-mean(X1)) + B2*(X2-mean(X2)) + ....
represent?
#
On Thu, 9 Jun 2005 Giuseppe.Palermo at bo.infn.it wrote:

            
The linear predictor, as you asked for.
#
On Thu, 9 Jun 2005 Giuseppe.Palermo at bo.infn.it wrote:

            
coxph() parametrizes the model so that

     h(t)=h_0(t)exp(B1(X1-mean(X1))+B2(X2-mean(X2))

as Brian pointed out.  This doesn't affect the coefficients B1, B2,..., it 
just redefines h_0 to be the hazard at mean covariates rather than at zero 
covariates.

The reason is that this makes h_0(t) more likely to be a useful thing to 
estimate. For example, if one covariate is age then extrapolating the 
baseline hazard to age zero is numerically unreliable and not very 
interesting.

 	-thomas