An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131120/e114390f/attachment.pl>
bias in AUCRF?
2 messages · Jack Luo, David Winsemius
On Nov 20, 2013, at 12:44 PM, Jack Luo wrote:
Hi,
I am using the AUCRF package for my data and I was firstly impressed by the
high performance of OOB-AUC. But after a while, I feel it might be due to
some sort of bias, which motivates me to use random data (generated using
rnorm) for a test.
The design is very simple: 100 observations with 50 in class 0 and 50 in
class 1. The number of variables is something I tuned (the main idea is
that if there is bias, the performance should increase with more
variables).
Presumably, there is no signal in the data and the true unbiased AUC should
not be too different from 0.5.
The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and
with more variables, it gets even higher.
Am I misunderstanding anything here?
Below is the R code I used to test:
Nvar = 200 # number of variables
Label = as.factor(c(rep(0,50),rep(1,50))) # class label
AUC_r = NULL
for (k in 1:10) { # control the randomness of generating random data
set.seed(k)
Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar)
DF = data.frame(Arandom,Label = Label)
for (j in 1:20) { # control the randomness of OOB
if (j %% 10 == 0) {cat(k,j,"\n")}
set.seed(j)
fit <- AUCRF(Label~., data=DF)
AUC_r = cbind(AUC_r,fit$AUCcurve$AUC)
}
}
plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab = "# of
Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1))
Shouldn't this question go to the package maintainer before being sent to Rhelp?
Thanks, -Jack [[alternative HTML version deleted]]
And:
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
David Winsemius Alameda, CA, USA