Skip to content

Computing Confidence Intervals for AUC in ROCR Package

7 messages · David Winsemius, Frank E Harrell Jr, Na'im R. Tyson

#
Dear R-philes,

I am plotting ROC curves for several cross-validation runs of a  
classifier (using the function below).  In addition to the average  
AUC, I am interested in obtaining a confidence interval for the  
average AUC.  Is there a straightforward way to do this via the ROCR  
package?

plot_roc_curve <- function(roc.dat, plt.title) {
	#print(str(vowel.ROC))
	pred <- prediction(roc.dat$predictions, roc.dat$labels)
	perf <- performance(pred, "tpr", "fpr")
	perf.auc <- performance(pred, "auc")
	perf.auc.areas <- slot(perf.auc, "y.values")
	curve.area <- mean(unlist(perf.auc.areas))
	#quartz(width=4, height=6)
	plot(perf, col="grey82", lty=3)
	plot(perf,lwd=3,avg="horizontal",spread.estimate="boxplot",
		add=T)
	title(main=plt.title)
	mtext(sprintf("%s%1.4f", "Area under Curve = ", curve.area),
		side=3, line=0, cex=0.8)
}

P.S. After years of studying statistical analysis as a student, I  
still consider myself a novice.
#
On Jan 22, 2010, at 3:53 AM, Na'im R. Tyson wrote:

            
You should probably contact the authors. When I tried using that  
package a few weeks ago, several of the annotation features were  
broken. I contacted the author who said there had been problems after  
converting to S4 method. He also said there would be a fix but not  
immediately. There has been a release since that time and I tried it,   
but it did not appear to fix the problems I encountered. All I was  
able to get were very simple ROC curves without any confidence  
intervals or marking of levels. I ended up turning to the Epi package  
for what I needed ( but I did not need confidence intervals so cannot  
comment on that aspect.)
#
Even though ROC curves don't shed much light on the problem, the area 
under the ROC is useful because it is the Wilcoxon-type concordance 
probability.  Denoting it by C, 2*(C-.5) is Somers' Dxy rank correlation 
between predictions and binary Y.  You can get the standard error of Dxy 
from the Hmisc package rcorr.cens function, and backsolve for s.e. of C 
hence get a confidence interval for C.  This uses U-statistics and is 
fairly assumption-free.

Frank
Na'im R. Tyson wrote:

  
    
#
David Winsemius wrote:
I'm wondering what was broken with the S3 implementation that made them 
change to S4.

Frank
#
On Jan 22, 2010, at 8:31 AM, Frank E Harrell Jr wrote:

            
I was typing from memory and may not have conveyed accurately what was  
in the message. He mention changing versions but my attribution of  
that problem as switching from S3 to S4 methods seems to have been a  
manufactured memory. Furthermore, on loading the package in its  
current form, I am no longer having the problems I earlier experienced.

So now my question to Tyson would be, what you were hoping to see with  
your request for confidence intervals? The "spread estimate" feature  
seems to have been fixed in version 1.0-4.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
Thank you for your very prompt response.  The authors of the ROCR  
package informed me the package works as stated in the documentation  
as long as you use R version 2.9.0--and indeed, it does!  I do not  
mind using a slightly older version of R to get the results I need.

It is useful to have the 'spread.estimate' feature for plotting, but I  
wanted a numerical confidence interval.  In short, I am comparing a  
few binary classifiers, and I want to show that the confidence  
intervals for the average AUC overlap.  You can see this graphically  
with the 'spread.estimate' option, but my dissertation committee  
prefers numbers.

Again, thank you for all of your help.  It has led me in the right  
direction.

Regards,

Na'im
On Jan 22, 2010, at 8:51 AM, David Winsemius wrote:

            
#
On Jan 22, 2010, at 12:01 PM, Na'im R. Tyson wrote:

            
Ouch. I'm pretty sure you can find discussion on R-help offered by Dr  
Harrell regarding why overlap of CI's for ROC curves would have low  
power to detect important differences. And there is also the fact that  
overlap of 95% CI's is not an appropriate test even in the more  
straightforward situation of comparing group means.