An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120920/33c76bd9/attachment.pl>
Decision Tree: Am I Missing Anything?
12 messages · Bhupendrasinh Thakre, Achim Zeileis, mxkuhn +2 more
Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it. Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it. Best Regards, Bhupendrasinh Thakre Sent from my iPhone
On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 0.8966 0.9655 0.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 0.75 0.8214 0.5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 0.75 0.6667 0.5 Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 0.7742 0.9032 0.6129 Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 0.8036 0.9464 0.6607 Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 0.8571 0.8095 0.5714 Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 0.7462 0.8846 0.6 Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 0.8061 0.8909 0.6 Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 0.871 0.9677 0.3226 Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 0.7176 0.9412 0.5176 Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 0.8217 0.8941 0.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 0.7719 0.8596 0.7018 Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 0.8571 0.9206 0.619 Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 0.7183 0.9155 0.5634 Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 0.717 0.8679 0.4717 Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 0.6962 0.8481 0.3418 Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 0.9167 0.9167 0.4167 Here are my R commands:
test.df = read.csv("test.csv")
head(test.df)
BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE + SPED + REVW, method="class", data=test.df)
printcp(testTree)
Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC +
MODE + SPED + REVW, data = test.df, method = "class")
Variables actually used in tree construction:
[1] FORM
Root node error: 21/22 = 0.95455
n= 22
CP nsplit rel error xerror xstd
1 0.047619 0 1.00000 1.0476 0
2 0.010000 1 0.95238 1.0476 0
I note that only one variable (FORM) was actually used in tree construction. When I run a plot using:
plot(testTree) text(testTree)
...I get a tree with one branch. It looks to me like I'm doing everything right, and this data is just not capable of predicting brand preference. Am I missing anything? Thanks very much in advance for any thoughts! -Vik [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120920/b7593e8e/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120920/0ffc270e/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120921/56c5835c/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120920/b5d3d1ee/attachment.pl>
Hi,
just to add a few points to the discussion:
- rpart() is able to deal with responses with more than two classes.
Setting method="class" explicitly is not necessary if the response is a
factor (as in this case).
- If your tree on this data is so huge that it can't even be plotted, I
wouldn't be surprised if it overfitted the data set. You should check for
this and possibly try to avoid unnecessary splits.
- There are various ways to do so for J48 trees without variable
reduction. One could require a larger minimal leaf size (default is 2) or
one can use "reduced error pruning", see WOW("J48") for more options. They
can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,
M = 10)) etc.
- There are various other ways of fitting decision trees, see for example
http://CRAN.R-project.org/view=MachineLearning for an overview. In
particular, you might like the "partykit" package which additionally
provides the ctree() method and has a unified plotting interface for
ctree, rpart, and J48.
hth,
Z
On Thu, 20 Sep 2012, Vik Rubenfeld wrote:
Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running :
respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree
...reports: J48 pruned tree ------------------ Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it. Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it. Best Regards, Bhupendrasinh Thakre Sent from my iPhone On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 0.8966 0.9655 0.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 0.75 0.8214 0.5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 0.75 0.6667 0.5 Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 0.7742 0.9032 0.6129 Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 0.8036 0.9464 0.6607 Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 0.8571 0.8095 0.5714 Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 0.7462 0.8846 0.6 Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 0.8061 0.8909 0.6 Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 0.871 0.9677 0.3226 Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 0.7176 0.9412 0.5176 Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 0.8217 0.8941 0.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 0.7719 0.8596 0.7018 Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 0.8571 0.9206 0.619 Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 0.7183 0.9155 0.5634 Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 0.717 0.8679 0.4717 Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 0.6962 0.8481 0.3418 Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 0.9167 0.9167 0.4167 Here are my R commands:
test.df = read.csv("test.csv")
head(test.df)
BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE + SPED + REVW, method="class", data=test.df)
printcp(testTree)
Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC +
MODE + SPED + REVW, data = test.df, method = "class")
Variables actually used in tree construction:
[1] FORM
Root node error: 21/22 = 0.95455
n= 22
CP nsplit rel error xerror xstd
1 0.047619 0 1.00000 1.0476 0
2 0.010000 1 0.95238 1.0476 0
I note that only one variable (FORM) was actually used in tree construction. When I run a plot using:
plot(testTree) text(testTree)
...I get a tree with one branch. It looks to me like I'm doing everything right, and this data is just not capable of predicting brand preference. Am I missing anything? Thanks very much in advance for any thoughts! -Vik [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
There is also C5.0 in the C50 package. It tends to have smaller trees that C4.5 and much smaller trees than J48 when there are factor predictors. Also, it has an optional feature selection ("winnow") step that can be used.
Max
On Sep 21, 2012, at 2:18 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
Hi,
just to add a few points to the discussion:
- rpart() is able to deal with responses with more than two classes. Setting method="class" explicitly is not necessary if the response is a factor (as in this case).
- If your tree on this data is so huge that it can't even be plotted, I wouldn't be surprised if it overfitted the data set. You should check for this and possibly try to avoid unnecessary splits.
- There are various ways to do so for J48 trees without variable reduction. One could require a larger minimal leaf size (default is 2) or one can use "reduced error pruning", see WOW("J48") for more options. They can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,
M = 10)) etc.
- There are various other ways of fitting decision trees, see for example http://CRAN.R-project.org/view=MachineLearning for an overview. In particular, you might like the "partykit" package which additionally provides the ctree() method and has a unified plotting interface for ctree, rpart, and J48.
hth,
Z
On Thu, 20 Sep 2012, Vik Rubenfeld wrote:
Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running :
respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree
...reports: J48 pruned tree ------------------ Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data. Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it. Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it. Best Regards, Bhupendrasinh Thakre Sent from my iPhone On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference. Here's the data set: BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 0.8966 0.9655 0.8276 Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6 Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 0.75 0.8214 0.5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 0.75 0.6667 0.5 Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 0.7742 0.9032 0.6129 Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 0.8036 0.9464 0.6607 Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475 Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 0.8571 0.8095 0.5714 Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 0.7462 0.8846 0.6 Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 0.8061 0.8909 0.6 Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 0.871 0.9677 0.3226 Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 0.7176 0.9412 0.5176 Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 0.8217 0.8941 0.6227 Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5 Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 0.7719 0.8596 0.7018 Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 0.8571 0.9206 0.619 Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 0.7183 0.9155 0.5634 Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 0.717 0.8679 0.4717 Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 0.6962 0.8481 0.3418 Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 0.9167 0.9167 0.4167 Here are my R commands:
test.df = read.csv("test.csv")
head(test.df)
BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE + SPED + REVW, method="class", data=test.df)
printcp(testTree)
Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC +
MODE + SPED + REVW, data = test.df, method = "class")
Variables actually used in tree construction:
[1] FORM
Root node error: 21/22 = 0.95455
n= 22
CP nsplit rel error xerror xstd
1 0.047619 0 1.00000 1.0476 0
2 0.010000 1 0.95238 1.0476 0
I note that only one variable (FORM) was actually used in tree construction. When I run a plot using:
plot(testTree) text(testTree)
...I get a tree with one branch. It looks to me like I'm doing everything right, and this data is just not capable of predicting brand preference. Am I missing anything? Thanks very much in advance for any thoughts! -Vik [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Max, I installed C50. I have a question about the syntax. Per the C50 manual:
## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)
## S3 method for class ?formula?
C5.0(formula, data, weights, subset,
na.action = na.pass, ...)
I believe I need the method for class 'formula'. But I don't yet see in the manual how to tell C50 that I want to use that method. If I run:
respLevel = read.csv("Resp Level Data.csv")
respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE + FREC + SPED, data = respLevel)
...I get an error message:
Error in gsub(":", ".", x, fixed = TRUE) :
input string 18 is invalid in this locale
What is the correct way to use the C5.0 method for class 'formula'?
-Vik
On Sep 21, 2012, at 4:18 AM, mxkuhn wrote:
There is also C5.0 in the C50 package. It tends to have smaller trees that C4.5 and much smaller trees than J48 when there are factor predictors. Also, it has an optional feature selection ("winnow") step that can be used.
Max
On Sep 21, 2012, at 2:18 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
Hi,
just to add a few points to the discussion:
- rpart() is able to deal with responses with more than two classes. Setting method="class" explicitly is not necessary if the response is a factor (as in this case).
- If your tree on this data is so huge that it can't even be plotted, I wouldn't be surprised if it overfitted the data set. You should check for this and possibly try to avoid unnecessary splits.
- There are various ways to do so for J48 trees without variable reduction. One could require a larger minimal leaf size (default is 2) or one can use "reduced error pruning", see WOW("J48") for more options. They can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,
M = 10)) etc.
- There are various other ways of fitting decision trees, see for example http://CRAN.R-project.org/view=MachineLearning for an overview. In particular, you might like the "partykit" package which additionally provides the ctree() method and has a unified plotting interface for ctree, rpart, and J48.
hth,
Z
On Thu, 20 Sep 2012, Vik Rubenfeld wrote:
Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set and got a 61.75% correct classification rate! Correctly Classified Instances 988 61.75 % Incorrectly Classified Instances 612 38.25 % Kappa statistic 0.5651 Mean absolute error 0.0432 Root mean squared error 0.1469 Relative absolute error 52.7086 % Root relative squared error 72.6299 % Coverage of cases (0.95 level) 99.6875 % Mean rel. region size (0.95 level) 15.4915 % Total Number of Instances 1600 When I plot it I get an enormous chart. Running :
respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE + SPED + REVW, data = respLevel) respLevelTree
...reports: J48 pruned tree ------------------ Is there a way to further prune the tree so that I can present a chart that would fit on a single page or two? Thanks very much in advance for any thoughts. -Vik
1 day later
Bhupendrashinh, thanks again for telling me about RWeka. That made a big difference in a job I was working on this week. Have a great weekend. -Vik
My pleasure. As a part of R team we are always here to help each other. Best Regards, Bhupendrasinh Thakre Sent from my iPhone
On Sep 22, 2012, at 1:46 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
Bhupendrashinh, thanks again for telling me about RWeka. That made a big difference in a job I was working on this week. Have a great weekend. -Vik
Vik,
On Fri, Sep 21, 2012 at 12:42 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
Max, I installed C50. I have a question about the syntax. Per the C50 manual:
## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)
## S3 method for class ?formula?
C5.0(formula, data, weights, subset,
na.action = na.pass, ...)
I believe I need the method for class 'formula'. But I don't yet see in the manual how to tell C50 that I want to use that method. If I run:
respLevel = read.csv("Resp Level Data.csv")
respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE + FREC + SPED, data = respLevel)
...I get an error message:
Error in gsub(":", ".", x, fixed = TRUE) :
input string 18 is invalid in this locale
You're not doing it wrong. Can you send me the results of sessionInfo()? I think there are a few issues with the function on windows, so a reproducible example would help solve the issue.
Max