Skip to content
Prev 306004 / 398506 Next

Decision Tree: Am I Missing Anything?

Hi,

just to add a few points to the discussion:

- rpart() is able to deal with responses with more than two classes. 
Setting method="class" explicitly is not necessary if the response is a 
factor (as in this case).

- If your tree on this data is so huge that it can't even be plotted, I 
wouldn't be surprised if it overfitted the data set. You should check for 
this and possibly try to avoid unnecessary splits.

- There are various ways to do so for J48 trees without variable 
reduction. One could require a larger minimal leaf size (default is 2) or 
one can use "reduced error pruning", see WOW("J48") for more options. They 
can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,
M = 10)) etc.

- There are various other ways of fitting decision trees, see for example 
http://CRAN.R-project.org/view=MachineLearning for an overview. In 
particular, you might like the "partykit" package which additionally 
provides the ctree() method and has a unified plotting interface for 
ctree, rpart, and J48.

hth,
Z
On Thu, 20 Sep 2012, Vik Rubenfeld wrote: