help with RandomForest classwt option
Hi, Betty: 1. Fortan code (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f) if(jclasswt.eq.0) then do j=1,nclass classwt(j)=1 enddo endif if(jclasswt.eq.1) then c fill in classwt(j) for each j: c classwt(1)=1. c classwt(2)=10. You need to set the jclasswt = 1 ( you can find by "search" through the codes). then "uncomment" the last two lines. Here you go with classwt in fortran. You can use this classwt for extremely-imbalanced classification problem. Down-sampling is one possible choice for that too but it is not directly implemented in rf. Check the following paper, and it might help. http://oz.berkeley.edu/users/chenchao/666.pdf 2. as to the wrapper function, the idea is that you can create a set of samples by applying some sampling probilities to implement down-sampling. Then build a rf model for each sample; suppose you call rf in this way for each sample, my.rf <- randomForest(...) then you can access the oob scores and prediction scores by my.rf$votes or my.rf$test$votes respectively. then you can average those scores by yourself, it is just like a simple meta-learning process but it does exactly what downsampling plus rf does, though downsampling is not implemented. 3. classwt and cutoff are used at different places. The former is used at two places: calculating the gini criteria and calculating the final vote from the leaf. While cutoff is only used in the final voting. So cutoff won't change the splitting while classwt can. However, since the current R's rf cannot do classwt, you can try to use cutoff to see if it helps in your case. 4. The fourth option is you can use my implementation of rf; But I did not write a manual for that; and it cannot show your splitting yet. HTH, weiwei
On 1/29/07, Betty Health <betty.health at gmail.com> wrote:
Thank you very much, Weiwei and Jim! Yeah, I did read the post by Andy, the contributor of this package. It seems that classwt is not implemented yet. For Weiwei's options, I have a few more questions. Thanks! "1. try to use rf in fortran by following the linky below http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm" I read the Fortran code briefly. But I did not find the options for down sampling. So does that mean I need to do down sampling myself? Could you explain a little more about "2. make a wrapper function to do the down sampling by yourself"? You mean I can do it in R or in Fortran? Some links plz? I haven't done this before. Yeah, cut off did change for the final classification results. However from what I tried, they did not influence how the nodes are split. So I would go further in the above 2 options. Thank you again! Betty On 1/28/07, Weiwei Shi <helprhelp at gmail.com> wrote:
Dear Betty: I could suggest 3 options: 1. try to use rf in fortran by following the linky below
2. make a wrapper function to do the down sampling by yourself 3. try to use cutoff in randomForest, which might help in your situation. HTH, weiwei On 1/28/07, Betty Health < betty.health at gmail.com> wrote:
Hello there, I am working on an extremely unbalanced two class classification
problems. I
wanna use "classwt" with "down sampling" together. By checking the
rfNews()
in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you
know
any method or softwares can handle this problem?
Thank you very much!!
Betty
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III