help with RandomForest classwt option

Hi, Betty:

1. Fortan code (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)

	if(jclasswt.eq.0) then
		do j=1,nclass
			classwt(j)=1
		enddo
	endif
	if(jclasswt.eq.1) then
c		fill in classwt(j) for each j:
c		classwt(1)=1.
c		classwt(2)=10.

You need to set the jclasswt = 1 ( you can find by "search" through the codes).
then "uncomment" the last two lines. Here you go with classwt in
fortran. You can use this classwt for extremely-imbalanced
classification problem. Down-sampling is one possible choice for that
too but it is not directly implemented in rf. Check the following
paper, and it might help.
http://oz.berkeley.edu/users/chenchao/666.pdf

2. as to the wrapper function, the idea is that you can create a set
of samples by applying some sampling probilities to implement
down-sampling. Then build a rf model for each sample;
suppose you call rf in this way for each sample,
my.rf <- randomForest(...)

then you can access the oob scores and prediction scores by
my.rf$votes or my.rf$test$votes respectively.

then you can average those scores by yourself, it is just like a
simple meta-learning process but it does exactly what downsampling
plus rf does, though downsampling is not implemented.

3. classwt and cutoff are used at different places. The former is used
at two places: calculating the gini criteria and calculating the final
vote from the leaf. While cutoff is only used in the final voting. So
cutoff won't change the splitting while classwt can. However, since
the current R's rf cannot do classwt, you can try to use cutoff to see
if it helps in your case.

4. The fourth option is you can use my implementation of rf; But I did
not write a manual for that; and it cannot show your splitting yet.

HTH,

weiwei

help with RandomForest classwt option

Thread (7 messages)