I am doing some coxPH model fitting and would like to have some idea
about how good the fits are. Someone suggested to use Frank Harrell's
C-index measure.
As I understand it, a C-index > 0.5 indicates a useful model. I am
probably making an error here because I am getting values less than 0.5
on real datasets. Can someone tell me where I am going wrong please ?
Here is an example using the German Breast Study Group data available in
the mfp package. The predictors in the model were selected by stepAIC().
library(Design); library(Hmisc); library(mfp); data(GBSG)
fit <- cph( Surv( rfst, cens ) ~ htreat + tumsize + tumgrad +
posnodal + prm, data=GBSG, x=T, y=T )
val <- validate.cph( fit, dxy=T, B=200 )
round(val, 3)
index.orig training test optimism index.corrected n
Dxy -0.377 -0.383 -0.370 -0.013 -0.364 200
R2 0.140 0.148 0.132 0.016 0.124 200
Slope 1.000 1.000 0.925 0.075 0.925 200
D 0.028 0.030 0.027 0.004 0.025 200
U -0.001 -0.001 0.002 -0.002 0.002 200
Q 0.029 0.031 0.025 0.006 0.023 200
1) Am I correct in assuming C-index = 0.5 * ( Dxy + 1 ) ?
2) If so, I am getting 0.5*(-0.3634+1) = 0.318 for the C-index. Does
this make sense ?
3) Should I be using some other measurement instead of C-index.
Thank you very much in advance.
Regards, Adai
C-index : typical values
4 messages · Adaikalavan Ramasamy, Frank E Harrell Jr
Adaikalavan Ramasamy wrote:
I am doing some coxPH model fitting and would like to have some idea about how good the fits are. Someone suggested to use Frank Harrell's C-index measure. As I understand it, a C-index > 0.5 indicates a useful model. I am
No, that just means predictions are better than random.
probably making an error here because I am getting values less than 0.5
on real datasets. Can someone tell me where I am going wrong please ?
Here is an example using the German Breast Study Group data available in
the mfp package. The predictors in the model were selected by stepAIC().
library(Design); library(Hmisc); library(mfp); data(GBSG)
fit <- cph( Surv( rfst, cens ) ~ htreat + tumsize + tumgrad +
posnodal + prm, data=GBSG, x=T, y=T )
val <- validate.cph( fit, dxy=T, B=200 )
round(val, 3)
index.orig training test optimism index.corrected n
Dxy -0.377 -0.383 -0.370 -0.013 -0.364 200
R2 0.140 0.148 0.132 0.016 0.124 200
Slope 1.000 1.000 0.925 0.075 0.925 200
D 0.028 0.030 0.027 0.004 0.025 200
U -0.001 -0.001 0.002 -0.002 0.002 200
Q 0.029 0.031 0.025 0.006 0.023 200
1) Am I correct in assuming C-index = 0.5 * ( Dxy + 1 ) ?
Yes
2) If so, I am getting 0.5*(-0.3634+1) = 0.318 for the C-index. Does this make sense ?
For the Cox model, the default calculation correlates the linear predictor with survival time. A large linear predictor (large log hazard) means shorter survival time. To phrase it in the more usually way, negate Dxy before computing C. Frank
3) Should I be using some other measurement instead of C-index. Thank you very much in advance. Regards, Adai
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Thank you ! So to be absolutely sure, the C-index in my case is
0.5 * ( 0.3634 + 1 ) = 0.6817 right ?
If the above calculation is correct then why do I get the following :
rcorr.cens( predict(fit), Surv( GBSG$rfst, GBSG$cens ) )[ "C Index" ]
C Index
0.3115156
( I am aware that is a re-substitution error rate and optimistic, but
this is what led me to believe that my C-index was < 0.5 ).
Can I suggest that it is probably worth adding a sentence about the
relationship between C-index and Dxy in validate.cph or elsewhere if
this is not a widely known issue.
Thank you again.
Regards, Adai
On Fri, 2005-09-02 at 19:55 -0400, Frank E Harrell Jr wrote:
Adaikalavan Ramasamy wrote:
I am doing some coxPH model fitting and would like to have some idea about how good the fits are. Someone suggested to use Frank Harrell's C-index measure. As I understand it, a C-index > 0.5 indicates a useful model. I am
No, that just means predictions are better than random.
probably making an error here because I am getting values less than 0.5
on real datasets. Can someone tell me where I am going wrong please ?
Here is an example using the German Breast Study Group data available in
the mfp package. The predictors in the model were selected by stepAIC().
library(Design); library(Hmisc); library(mfp); data(GBSG)
fit <- cph( Surv( rfst, cens ) ~ htreat + tumsize + tumgrad +
posnodal + prm, data=GBSG, x=T, y=T )
val <- validate.cph( fit, dxy=T, B=200 )
round(val, 3)
index.orig training test optimism index.corrected n
Dxy -0.377 -0.383 -0.370 -0.013 -0.364 200
R2 0.140 0.148 0.132 0.016 0.124 200
Slope 1.000 1.000 0.925 0.075 0.925 200
D 0.028 0.030 0.027 0.004 0.025 200
U -0.001 -0.001 0.002 -0.002 0.002 200
Q 0.029 0.031 0.025 0.006 0.023 200
1) Am I correct in assuming C-index = 0.5 * ( Dxy + 1 ) ?
Yes
2) If so, I am getting 0.5*(-0.3634+1) = 0.318 for the C-index. Does this make sense ?
For the Cox model, the default calculation correlates the linear predictor with survival time. A large linear predictor (large log hazard) means shorter survival time. To phrase it in the more usually way, negate Dxy before computing C. Frank
3) Should I be using some other measurement instead of C-index. Thank you very much in advance. Regards, Adai
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Adaikalavan Ramasamy wrote:
Thank you ! So to be absolutely sure, the C-index in my case is 0.5 * ( 0.3634 + 1 ) = 0.6817 right ?
correct
If the above calculation is correct then why do I get the following :
rcorr.cens( predict(fit), Surv( GBSG$rfst, GBSG$cens ) )[ "C Index" ]
C Index
0.3115156
( I am aware that is a re-substitution error rate and optimistic, but
this is what led me to believe that my C-index was < 0.5 ).
You're right about the optimism but that's not the cause in this case.
Can I suggest that it is probably worth adding a sentence about the relationship between C-index and Dxy in validate.cph or elsewhere if this is not a widely known issue.
Will do -Frank
Thank you again. Regards, Adai On Fri, 2005-09-02 at 19:55 -0400, Frank E Harrell Jr wrote:
Adaikalavan Ramasamy wrote:
I am doing some coxPH model fitting and would like to have some idea about how good the fits are. Someone suggested to use Frank Harrell's C-index measure. As I understand it, a C-index > 0.5 indicates a useful model. I am
No, that just means predictions are better than random.
probably making an error here because I am getting values less than 0.5
on real datasets. Can someone tell me where I am going wrong please ?
Here is an example using the German Breast Study Group data available in
the mfp package. The predictors in the model were selected by stepAIC().
library(Design); library(Hmisc); library(mfp); data(GBSG)
fit <- cph( Surv( rfst, cens ) ~ htreat + tumsize + tumgrad +
posnodal + prm, data=GBSG, x=T, y=T )
val <- validate.cph( fit, dxy=T, B=200 )
round(val, 3)
index.orig training test optimism index.corrected n
Dxy -0.377 -0.383 -0.370 -0.013 -0.364 200
R2 0.140 0.148 0.132 0.016 0.124 200
Slope 1.000 1.000 0.925 0.075 0.925 200
D 0.028 0.030 0.027 0.004 0.025 200
U -0.001 -0.001 0.002 -0.002 0.002 200
Q 0.029 0.031 0.025 0.006 0.023 200
1) Am I correct in assuming C-index = 0.5 * ( Dxy + 1 ) ?
Yes
2) If so, I am getting 0.5*(-0.3634+1) = 0.318 for the C-index. Does this make sense ?
For the Cox model, the default calculation correlates the linear predictor with survival time. A large linear predictor (large log hazard) means shorter survival time. To phrase it in the more usually way, negate Dxy before computing C. Frank
3) Should I be using some other measurement instead of C-index. Thank you very much in advance. Regards, Adai
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University