An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110602/6970f096/attachment.pl>
aucRoc in caret package [SEC=UNCLASSIFIED]
7 messages · Li Jin, Max Kuhn, David Winsemius
Using AUC for discrete predictor variables with inly two levels doesn't seem very sensible. What are you planning to to with this measure?
David. On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote: > Hi all, > I used the following code and data to get auc values for two sets of > predictions: > library(caret) >> table(predicted1, trainy) > trainy > hard soft > 1 27 0 > 2 11 99 >> aucRoc(roc(predicted1, trainy)) > [1] 0.5 > > >> table(predicted2, trainy) > trainy > hard soft > 1 27 2 > 2 11 97 >> aucRoc(roc(predicted2, trainy)) > [1] 0.8451621 > > predicted1: > 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 > 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 > 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 > 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > predicted2: > 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 > 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 > 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 > 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > trainy: > hard hard hard soft soft hard hard hard hard soft soft soft soft > soft soft hard soft soft soft soft soft soft hard soft soft soft > soft soft soft soft soft soft hard soft soft soft soft soft hard > soft soft soft soft hard hard soft soft soft hard soft hard soft > soft soft soft soft hard soft soft soft soft soft soft soft soft > hard soft soft soft soft soft hard soft soft soft soft soft soft > soft hard soft soft soft hard hard hard hard hard soft soft hard > hard hard soft hard soft soft soft hard hard soft soft soft soft > soft hard hard hard hard hard hard hard soft soft soft soft soft > soft soft soft soft soft soft soft soft soft soft soft hard soft > soft soft soft soft soft soft soft > Levels: hard soft > >> Sys.info() > sysname > release version nodename > "Windows" "XP" "build > 2600, Service Pack 3" "PC-60772" > machine > "x86" > > I would expect predicted1 is more accurate that the predicted2. But > the auc values show an opposite. I was wondering whether this is a > bug or I have done something wrong. Thanks for your help in advance! > > Cheers, > > Jin > ____________________________________ > Jin Li, PhD > Spatial Modeller/Computational Statistician > Marine & Coastal Environment > Geoscience Australia > GPO Box 378, Canberra, ACT 2601, Australia > > Ph: 61 (02) 6249 9899; email: > jin.li at ga.gov.au<mailto:jin.li at ga.gov.au> > _______________________________________ > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
Please note that predicted1 and predicted2 are two sets of predictions instead of predictors. As you can see the predictions with only two levels, 1 is for hard and 2 for soft. I need to assess which one is more accurate. Hope this is clear now. Thanks. Jin -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, 2 June 2011 10:55 AM To: Li Jin Cc: R-help at r-project.org Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] Using AUC for discrete predictor variables with inly two levels doesn't seem very sensible. What are you planning to to with this measure?
David. On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote: > Hi all, > I used the following code and data to get auc values for two sets of > predictions: > library(caret) >> table(predicted1, trainy) > trainy > hard soft > 1 27 0 > 2 11 99 >> aucRoc(roc(predicted1, trainy)) > [1] 0.5 > > >> table(predicted2, trainy) > trainy > hard soft > 1 27 2 > 2 11 97 >> aucRoc(roc(predicted2, trainy)) > [1] 0.8451621 > > predicted1: > 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 > 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 > 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 > 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > predicted2: > 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 > 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 > 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 > 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > trainy: > hard hard hard soft soft hard hard hard hard soft soft soft soft > soft soft hard soft soft soft soft soft soft hard soft soft soft > soft soft soft soft soft soft hard soft soft soft soft soft hard > soft soft soft soft hard hard soft soft soft hard soft hard soft > soft soft soft soft hard soft soft soft soft soft soft soft soft > hard soft soft soft soft soft hard soft soft soft soft soft soft > soft hard soft soft soft hard hard hard hard hard soft soft hard > hard hard soft hard soft soft soft hard hard soft soft soft soft > soft hard hard hard hard hard hard hard soft soft soft soft soft > soft soft soft soft soft soft soft soft soft soft soft hard soft > soft soft soft soft soft soft soft > Levels: hard soft > >> Sys.info() > sysname > release version nodename > "Windows" "XP" "build > 2600, Service Pack 3" "PC-60772" > machine > "x86" > > I would expect predicted1 is more accurate that the predicted2. But > the auc values show an opposite. I was wondering whether this is a > bug or I have done something wrong. Thanks for your help in advance! > > Cheers, > > Jin > ____________________________________ > Jin Li, PhD > Spatial Modeller/Computational Statistician > Marine & Coastal Environment > Geoscience Australia > GPO Box 378, Canberra, ACT 2601, Australia > > Ph: 61 (02) 6249 9899; email: > jin.li at ga.gov.au<mailto:jin.li at ga.gov.au> > _______________________________________ > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
On Jun 1, 2011, at 9:24 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote:
Please note that predicted1 and predicted2 are two sets of predictions instead of predictors. As you can see the predictions with only two levels, 1 is for hard and 2 for soft.
Yes, I (very clearly I think) saw that.
I need to assess which one is more accurate. Hope this is clear now. Thanks. Jin
So how big do you want to dig your hole? AUC is not designed to be a score for categorical variables. It's designed for a continuous predictor. The only information in your two-way classification of dichotomous states is in the off-axis values.... 11 to naught versus 11 to 2. Other than that you have total agreement. Not much to work on.
david. > > -----Original Message----- > From: David Winsemius [mailto:dwinsemius at comcast.net] > Sent: Thursday, 2 June 2011 10:55 AM > To: Li Jin > Cc: R-help at r-project.org > Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] > > Using AUC for discrete predictor variables with inly two levels > doesn't seem very sensible. What are you planning to to with this > measure? > > -- > David. > > On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> > wrote: > >> Hi all, >> I used the following code and data to get auc values for two sets of >> predictions: >> library(caret) >>> table(predicted1, trainy) >> trainy >> hard soft >> 1 27 0 >> 2 11 99 >>> aucRoc(roc(predicted1, trainy)) >> [1] 0.5 >> >> >>> table(predicted2, trainy) >> trainy >> hard soft >> 1 27 2 >> 2 11 97 >>> aucRoc(roc(predicted2, trainy)) >> [1] 0.8451621 >> >> predicted1: >> 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 >> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 >> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 >> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >> >> predicted2: >> 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 >> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 >> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 >> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >> >> trainy: >> hard hard hard soft soft hard hard hard hard soft soft soft soft >> soft soft hard soft soft soft soft soft soft hard soft soft soft >> soft soft soft soft soft soft hard soft soft soft soft soft hard >> soft soft soft soft hard hard soft soft soft hard soft hard soft >> soft soft soft soft hard soft soft soft soft soft soft soft soft >> hard soft soft soft soft soft hard soft soft soft soft soft soft >> soft hard soft soft soft hard hard hard hard hard soft soft hard >> hard hard soft hard soft soft soft hard hard soft soft soft soft >> soft hard hard hard hard hard hard hard soft soft soft soft soft >> soft soft soft soft soft soft soft soft soft soft soft hard soft >> soft soft soft soft soft soft soft >> Levels: hard soft >> >>> Sys.info() >> sysname >> release version nodename >> "Windows" "XP" "build >> 2600, Service Pack 3" "PC-60772" >> machine >> "x86" >> >> I would expect predicted1 is more accurate that the predicted2. But >> the auc values show an opposite. I was wondering whether this is a >> bug or I have done something wrong. Thanks for your help in advance! >> >> Cheers, >> >> Jin >> ____________________________________ >> Jin Li, PhD >> Spatial Modeller/Computational Statistician David Winsemius, MD West Hartford, CT
David, The ROC curve should really be computed with some sort of numeric data (as opposed to classes). It varies the cutoff to get a continuum of sensitivity and specificity values. ?Using the classes as 1's and 2's implies that the second class is twice the value of the first, which doesn't really make sense. Try getting the class probabilities for predicted1 and predicted2 and use those instead. Thanks, Max
On Wed, Jun 1, 2011 at 9:24 PM, <Jin.Li at ga.gov.au> wrote:
Please note that predicted1 and predicted2 are two sets of predictions instead of predictors. As you can see the predictions with only two levels, 1 is for hard and 2 for soft. I need to assess which one is more accurate. Hope this is clear now. Thanks. Jin -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, 2 June 2011 10:55 AM To: Li Jin Cc: R-help at r-project.org Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] Using AUC for discrete predictor variables with inly two levels doesn't seem very sensible. What are you planning to to with this measure? -- David. On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote:
Hi all, I used the following code and data to get auc values for two sets of predictions: ? ? ? ? ? ?library(caret)
table(predicted1, trainy)
? trainy ? ?hard soft ?1 ? 27 ? ?0 ?2 ? 11 ? 99
aucRoc(roc(predicted1, trainy))
[1] 0.5
table(predicted2, trainy)
? trainy ? ?hard soft ?1 ? 27 ? ?2 ?2 ? 11 ? 97
aucRoc(roc(predicted2, trainy))
[1] 0.8451621 predicted1: 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 predicted2: 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 trainy: hard hard hard soft soft hard hard hard hard soft soft soft soft soft soft hard soft soft soft soft soft soft hard soft soft soft soft soft soft soft soft soft hard soft soft soft soft soft hard soft soft soft soft hard hard soft soft soft hard soft hard soft soft soft soft soft hard soft soft soft soft soft soft soft soft hard soft soft soft soft soft hard soft soft soft soft soft soft soft hard soft soft soft hard hard hard hard hard soft soft hard hard hard soft hard soft soft soft hard hard soft soft soft soft soft hard hard hard hard hard hard hard soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft hard soft soft soft soft soft soft soft soft Levels: hard soft
Sys.info()
? ? ? ? ? ? ? ? ? ? sysname release ? ? ? ? ? ? ? ? ? ? ?version ? ? ? ? ? ? ? ? ? ? nodename ? ? ? ? ? ? ? ? ? "Windows" ? ? ? ? ? ? ? ? ? ? ?"XP" ? ? ? ?"build 2600, Service Pack 3" ? ? ? ?"PC-60772" ? ? ? ? ? ? ? ? ? ? machine ? ? ? ? ? ? ? ? ? ? ? "x86" I would expect predicted1 is more accurate that the predicted2. But the auc values show an opposite. I was wondering whether this is a bug or I have done something wrong. ?Thanks for your help in advance! Cheers, Jin
____________________________________ Jin Li, PhD Spatial Modeller/Computational Statistician Marine & Coastal Environment Geoscience Australia GPO Box 378, Canberra, ACT 2601, Australia Ph: 61 (02) 6249 9899; email: jin.li at ga.gov.au<mailto:jin.li at ga.gov.au> _______________________________________ ? ? ? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Max
On Jun 1, 2011, at 10:41 PM, Max Kuhn wrote:
David, The ROC curve should really be computed with some sort of numeric data (as opposed to classes). It varies the cutoff to get a continuum of sensitivity and specificity values. Using the classes as 1's and 2's implies that the second class is twice the value of the first, which doesn't really make sense. Try getting the class probabilities for predicted1 and predicted2 and use those instead.
Yes. You should be addressing this to Jin. I have been trying with little success to explain this.
David. > > Thanks, > > Max > > > On Wed, Jun 1, 2011 at 9:24 PM, <Jin.Li at ga.gov.au> wrote: >> >> Please note that predicted1 and predicted2 are two sets of >> predictions instead of predictors. As you can see the predictions >> with only two levels, 1 is for hard and 2 for soft. I need to >> assess which one is more accurate. Hope this is clear now. Thanks. >> Jin >> >> -----Original Message----- >> From: David Winsemius [mailto:dwinsemius at comcast.net] >> Sent: Thursday, 2 June 2011 10:55 AM >> To: Li Jin >> Cc: R-help at r-project.org >> Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] >> >> Using AUC for discrete predictor variables with inly two levels >> doesn't seem very sensible. What are you planning to to with this >> measure? >> >> -- >> David. >> >> On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> >> wrote: >> >>> Hi all, >>> I used the following code and data to get auc values for two sets of >>> predictions: >>> library(caret) >>>> table(predicted1, trainy) >>> trainy >>> hard soft >>> 1 27 0 >>> 2 11 99 >>>> aucRoc(roc(predicted1, trainy)) >>> [1] 0.5 >>> >>> >>>> table(predicted2, trainy) >>> trainy >>> hard soft >>> 1 27 2 >>> 2 11 97 >>>> aucRoc(roc(predicted2, trainy)) >>> [1] 0.8451621 >>> >>> predicted1: >>> 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 >>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 >>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 >>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >>> 2 2 >>> >>> predicted2: >>> 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 >>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 >>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 >>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >>> 2 2 >>> >>> trainy: >>> hard hard hard soft soft hard hard hard hard soft soft soft soft >>> soft soft hard soft soft soft soft soft soft hard soft soft soft >>> soft soft soft soft soft soft hard soft soft soft soft soft hard >>> soft soft soft soft hard hard soft soft soft hard soft hard soft >>> soft soft soft soft hard soft soft soft soft soft soft soft soft >>> hard soft soft soft soft soft hard soft soft soft soft soft soft >>> soft hard soft soft soft hard hard hard hard hard soft soft hard >>> hard hard soft hard soft soft soft hard hard soft soft soft soft >>> soft hard hard hard hard hard hard hard soft soft soft soft soft >>> soft soft soft soft soft soft soft soft soft soft soft hard soft >>> soft soft soft soft soft soft soft >>> Levels: hard soft >>> >>>> Sys.info() >>> sysname >>> release version nodename >>> "Windows" "XP" "build >>> 2600, Service Pack 3" "PC-60772" >>> machine >>> "x86" >>> >>> I would expect predicted1 is more accurate that the predicted2. But >>> the auc values show an opposite. I was wondering whether this is a >>> bug or I have done something wrong. Thanks for your help in >>> advance! >>> >>> Cheers, >>> >>> Jin >>> ____________________________________ >>> Jin Li, PhD >>> Spatial Modeller/Computational Statistician >>> Marine & Coastal Environment >>> Geoscience Australia >>> GPO Box 378, Canberra, ACT 2601, Australia >>> >>> Ph: 61 (02) 6249 9899; email: >>> jin.li at ga.gov.au<mailto:jin.li at ga.gov.au> >>> _______________________________________ >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Max David Winsemius, MD West Hartford, CT
Hi All, Thanks for the clarification. Now, perhaps I should use kappa instead. Since my predictions are in 1 and 2, there are no numeric predictions. To my surprise, when I applied kappa and auc to the data, their values are highly correlated, with only an exception when there are perfect predictions for one or both classes. Are there any other accuracy measurements applicable to such predictions with two unbalanced classes? Thanks, Jin -----Original Message----- From: Max Kuhn [mailto:mxkuhn at gmail.com] Sent: Thursday, 2 June 2011 12:41 PM To: Li Jin Cc: dwinsemius at comcast.net; R-help at r-project.org Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] David, The ROC curve should really be computed with some sort of numeric data (as opposed to classes). It varies the cutoff to get a continuum of sensitivity and specificity values. ?Using the classes as 1's and 2's implies that the second class is twice the value of the first, which doesn't really make sense. Try getting the class probabilities for predicted1 and predicted2 and use those instead. Thanks, Max
On Wed, Jun 1, 2011 at 9:24 PM, <Jin.Li at ga.gov.au> wrote:
Please note that predicted1 and predicted2 are two sets of predictions instead of predictors. As you can see the predictions with only two levels, 1 is for hard and 2 for soft. I need to assess which one is more accurate. Hope this is clear now. Thanks.
Jin -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, 2 June 2011 10:55 AM To: Li Jin Cc: R-help at r-project.org Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED] Using AUC for discrete predictor variables with inly two levels doesn't seem very sensible. What are you planning to to with this measure? -- David. On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote:
Hi all, I used the following code and data to get auc values for two sets of predictions: ? ? ? ? ? ?library(caret)
table(predicted1, trainy)
? trainy ? ?hard soft ?1 ? 27 ? ?0 ?2 ? 11 ? 99
aucRoc(roc(predicted1, trainy))
[1] 0.5
table(predicted2, trainy)
? trainy ? ?hard soft ?1 ? 27 ? ?2 ?2 ? 11 ? 97
aucRoc(roc(predicted2, trainy))
[1] 0.8451621 predicted1: 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 predicted2: 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 trainy: hard hard hard soft soft hard hard hard hard soft soft soft soft soft soft hard soft soft soft soft soft soft hard soft soft soft soft soft soft soft soft soft hard soft soft soft soft soft hard soft soft soft soft hard hard soft soft soft hard soft hard soft soft soft soft soft hard soft soft soft soft soft soft soft soft hard soft soft soft soft soft hard soft soft soft soft soft soft soft hard soft soft soft hard hard hard hard hard soft soft hard hard hard soft hard soft soft soft hard hard soft soft soft soft soft hard hard hard hard hard hard hard soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft hard soft soft soft soft soft soft soft soft Levels: hard soft
Sys.info()
? ? ? ? ? ? ? ? ? ? sysname release ? ? ? ? ? ? ? ? ? ? ?version ? ? ? ? ? ? ? ? ? ? nodename ? ? ? ? ? ? ? ? ? "Windows" ? ? ? ? ? ? ? ? ? ? ?"XP" ? ? ? ?"build 2600, Service Pack 3" ? ? ? ?"PC-60772" ? ? ? ? ? ? ? ? ? ? machine ? ? ? ? ? ? ? ? ? ? ? "x86" I would expect predicted1 is more accurate that the predicted2. But the auc values show an opposite. I was wondering whether this is a bug or I have done something wrong. ?Thanks for your help in advance! Cheers, Jin
____________________________________ Jin Li, PhD Spatial Modeller/Computational Statistician Marine & Coastal Environment Geoscience Australia GPO Box 378, Canberra, ACT 2601, Australia Ph: 61 (02) 6249 9899; email: jin.li at ga.gov.au<mailto:jin.li at ga.gov.au> _______________________________________ ? ? ? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Max