Dear Dr Harrell,
Thank you very much for your answer. Actually I also tried to found the C
index by hand on these data using the mean probabilities and I found
0.968, as you just showed.
I understand now why I had a slight difference with the outpout of lrm. I
am thus convinced that this result is correct.
I read on the SAS help that the procedure logistic also proceed to some
binning (BINWIDTH option) :
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
But I cannot explain why the difference between the two softwares is that
huge, especially since the class probabilities are the same.
Do you think it could be due to the fact that mean probabilities are
computed differently ?
Thank for your help and best regards,
OC
Date: Thu, 24 Jan 2013 05:28:13 -0800
From:
Subject: Re: [R] Difference between R and SAS in Corcordance index in
ordinal logistic regression
lrm does some binning to make the calculations faster. The exact
calculation
is obtained by running
f <- lrm(...)
rcorr.cens(predict(f), DA), which results in:
C Index Dxy S.D. n
missing
0.96814404 0.93628809 0.03808336 32.00000000
0.00000000
uncensored Relevant Pairs Concordant Uncertain
32.00000000 722.00000000 699.00000000 0.00000000
I.e., C=68 instead of .963. But this is even farther away than the
value
from SAS you reported.
If you don't believe the rcorr.cens result, create a tiny example and do
the
calculations by hand.
Frank
blackscorpio81 wrote
Dear R users,
Please allow to me ask for your help.
I am currently using Frank Harrell Jr package "rms" to model ordinal
logistic regression with proportional odds. In order to assess model
predictive ability, C concordance index is displayed and equals to
require(rms)
a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
lrm(DA~SJ+TJ,data=
Logistic Regression Model
lrm(formula =A~SJ+TJ, data = a)
Frequencies of Responses
1 2 3 4
6 13 9 4
Model Likelihood
Discrimination Rank Discrim.
Ratio Test
Indexes Indexes
Obs 32 LR chi2 53.14
0.875 C 0.963
max |deriv| 6e-06 d.f. 2 g
8.690 Dxy 0.925
Pr(> chi2) <0.0001
5942.469 gamma 0.960
gp 0.486 tau-a 0.673
Brier 0.022
Coef S.E. Wald Z
y>= -0.6161 0.6715 -0.92 0.3589
y>= -6.5949 2.3750 -2.78 0.0055
y>= -16.2358 5.3737 -3.02 0.0025
SJ 1.4341 0.5180 2.77 0.0056
TJ 0.5312 0.2483 2.14 0.0324
I wanted to compare the results with SAS. I found the same slopes and
intercept with opposite signs, which is normal since R models the
probabilities P(Y>=X) whereas SAS models the probabilities P(Y<=k|X)
(see pdf attached, page 2 , table "Association des probabilit??s
et des r??ponses observ??es").
SAS_Report_-_Logistic_Regression.pdf
I chose the order for levels.
I controlled that the corresponding probabilities P(Y=X) are the
with both softwares. But I can't understand why in SAS the C index
from 0.963 down to 0.332.
I read a lot of things about this and it seems to me that both
use slightly different technique to compute the C index ; it is
nevertheless surprising to me to observe such a shift in the results.
Does anyone have a clue on this ?
Thank you very much for you help
Blackscorpio