Skip to content

Decision Tree: Am I Missing Anything?

12 messages · Bhupendrasinh Thakre, Achim Zeileis, mxkuhn +2 more

#
Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. 

Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method  = class is not able to able to understand what you actually want from it.

Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it.

Best Regards,

Bhupendrasinh Thakre
Sent from my iPhone
On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:

            
#
Hi,

just to add a few points to the discussion:

- rpart() is able to deal with responses with more than two classes. 
Setting method="class" explicitly is not necessary if the response is a 
factor (as in this case).

- If your tree on this data is so huge that it can't even be plotted, I 
wouldn't be surprised if it overfitted the data set. You should check for 
this and possibly try to avoid unnecessary splits.

- There are various ways to do so for J48 trees without variable 
reduction. One could require a larger minimal leaf size (default is 2) or 
one can use "reduced error pruning", see WOW("J48") for more options. They 
can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,
M = 10)) etc.

- There are various other ways of fitting decision trees, see for example 
http://CRAN.R-project.org/view=MachineLearning for an overview. In 
particular, you might like the "partykit" package which additionally 
provides the ctree() method and has a unified plotting interface for 
ctree, rpart, and J48.

hth,
Z
On Thu, 20 Sep 2012, Vik Rubenfeld wrote:

            
#
There is also C5.0 in the C50 package. It tends to have smaller trees that C4.5 and much smaller trees than J48 when there are factor predictors. Also, it has an optional feature selection ("winnow") step that can be used. 

Max
On Sep 21, 2012, at 2:18 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:

            
#
Max, I installed C50. I have a question about the syntax. Per the C50 manual:

## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)

## S3 method for class ?formula?
C5.0(formula, data, weights, subset,
na.action = na.pass, ...)

I believe I need the method for class 'formula'. But I don't yet see in the manual how to tell C50 that I want to use that method. If I run:

respLevel = read.csv("Resp Level Data.csv")
respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE + FREC + SPED, data = respLevel)

...I get an error message:

Error in gsub(":", ".", x, fixed = TRUE) : 
  input string 18 is invalid in this locale

What is the correct way to use the C5.0 method for class 'formula'?


-Vik
On Sep 21, 2012, at 4:18 AM, mxkuhn wrote:

            
1 day later
#
Bhupendrashinh, thanks again for telling me about RWeka.  That made a big difference in a job I was working on this week. 

Have a great weekend.


-Vik
#
My pleasure. As a part of R team we are always here to help each other. 

Best Regards,

Bhupendrasinh Thakre
Sent from my iPhone
On Sep 22, 2012, at 1:46 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:

            
#
Vik,
On Fri, Sep 21, 2012 at 12:42 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
You're not doing it wrong.

Can you send me the results of sessionInfo()? I think there are a few
issues with the function on windows, so a reproducible example would
help solve the issue.