Hi I'm trying to get the top decision rules from a decision tree. Eventually I will like to do this with R and Random Forrest. There has to be a way to output the decsion rules of each leaf node in an easily readable way. I am looking at the randomforrest and rpart packages and I dont see anything yet. Mike
Decision Tree and Random Forrest
8 messages · Michael Artz, Bert Gunter, Sarah Goslee +1 more
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree.
Eventually I will like to do this with R and Random Forrest. There has to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with respective to bagging and feature sampling and getting the mode from the leaf nodes and it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form that, and I need to do it in R.
On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree. Eventually I will like to do this with R and Random Forrest. There has
to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Also that being said, just because random forest are not the same thing as decision trees does not mean that you can't get decision rules from random forest. On Wed, Apr 13, 2016 at 4:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with respective to bagging and feature sampling and getting the mode from the leaf nodes and it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form that, and I need to do it in R. On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree. Eventually I will like to do this with R and Random Forrest. There has
to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I think you are missing the point of random forests. But if you just want to predict using the forest, there is a predict() method that you can use. Other than that, I certainly don't understand what you mean. Maybe someone else might. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with respective to bagging and feature sampling and getting the mode from the leaf nodes and it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form that, and I need to do it in R. On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree.
Eventually I will like to do this with R and Random Forrest. There has
to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ah yes I will have to use the predict function. But the predict function will not get me there really. If I can take the example that I have a model predicting whether or not I will play golf (this is the dependent value), and there are three independent variables Humidity(High, Medium, Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High, Low). I would like rules like where any record that follows these rules (IF humidity = high AND pending_chores = None AND Wind = High THEN 77% there is probability that play_golf is YES). I was thinking that random forrest would weight the rules somehow on the collection of trees and give a probability. But if that doesnt make sense, then can you just tell me how to get the decsion rules with one tree and I will work from that. Mike Mike
On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
I think you are missing the point of random forests. But if you just want to predict using the forest, there is a predict() method that you can use. Other than that, I certainly don't understand what you mean. Maybe someone else might. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with
respective
to bagging and feature sampling and getting the mode from the leaf nodes
and
it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form
that,
and I need to do it in R. On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree. Eventually I will like to do this with R and Random Forrest. There
has
to be a way to output the decsion rules of each leaf node in an easily readable way. I am looking at the randomforrest and rpart packages
and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
It sounds like you want classification or regression trees. rpart does exactly what you describe. Here's an overview: http://www.statmethods.net/advstats/cart.html But there are a lot of other ways to do the same thing in R, for instance: http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/ You can get the same kind of information from random forests, but it's less straightforward. If you want a clear set of rules as in your golf example, then you need rpart or similar. Sarah
On Wed, Apr 13, 2016 at 6:02 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ah yes I will have to use the predict function. But the predict function will not get me there really. If I can take the example that I have a model predicting whether or not I will play golf (this is the dependent value), and there are three independent variables Humidity(High, Medium, Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High, Low). I would like rules like where any record that follows these rules (IF humidity = high AND pending_chores = None AND Wind = High THEN 77% there is probability that play_golf is YES). I was thinking that random forrest would weight the rules somehow on the collection of trees and give a probability. But if that doesnt make sense, then can you just tell me how to get the decsion rules with one tree and I will work from that. Mike Mike On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
I think you are missing the point of random forests. But if you just want to predict using the forest, there is a predict() method that you can use. Other than that, I certainly don't understand what you mean. Maybe someone else might. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with
respective
to bagging and feature sampling and getting the mode from the leaf nodes
and
it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form
that,
and I need to do it in R. On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter
On Thu, 14 Apr 2016, Michael Artz wrote:
Ah yes I will have to use the predict function. But the predict function will not get me there really. If I can take the example that I have a model predicting whether or not I will play golf (this is the dependent value), and there are three independent variables Humidity(High, Medium, Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High, Low). I would like rules like where any record that follows these rules (IF humidity = high AND pending_chores = None AND Wind = High THEN 77% there is probability that play_golf is YES).
Although I think that this toy example is not overly useful for practical
illustrations we have included the standard dataset in the "partykit"
package:
## data
data("WeatherPlay", package = "partykit")
I was thinking that random forrest would weight the rules somehow on the collection of trees and give a probability. But if that doesnt make sense, then can you just tell me how to get the decsion rules with one tree and I will work from that.
Then you can learn one tree on this data, e.g., with rpart() or ctree():
## trees
library("rpart")
rp <- rpart(play ~ ., data = WeatherPlay,
control = rpart.control(minsplit = 5))
library("partykit")
ct <- ctree(play ~ ., data = WeatherPlay,
minsplit = 5, mincriterion = 0.1)
## visualize via partykit
pr <- as.party(rp)
plot(pr)
plot(ct)
And the partykit package also includes a function to generate a text
representation of the rules although this is currently not exported:
partykit:::.list.rules.party(pr)
## "outlook %in% c(\"overcast\")"
## 4
## "outlook %in% c(\"sunny\", \"rainy\") & humidity < 82.5"
## 5
## "outlook %in% c(\"sunny\", \"rainy\") & humidity >= 82.5"
partykit:::.list.rules.party(ct)
## 2 3
## "humidity <= 80" "humidity > 80"
If you do not want a text representation but something else you can
compute on, then look at the source code of partykit:::.list.rules.party()
and try to adapt it to your needs.
On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
I think you are missing the point of random forests. But if you just want to predict using the forest, there is a predict() method that you can use. Other than that, I certainly don't understand what you mean. Maybe someone else might. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Ok is there a way to do it with decision tree? I just need to make the decision rules. Perhaps I can pick one of the trees used with Random Forrest. I am somewhat familiar already with Random Forrest with
respective
to bagging and feature sampling and getting the mode from the leaf nodes
and
it being an ensemble technique of many trees. I am just working from the perspective that I need decision rules, and I am working backward form
that,
and I need to do it in R. On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
Nope. Random forests are not decision trees -- they are ensembles (forests) of trees. You need to go back and read up on them so you understand how they work. The Hastie/Tibshirani/Friedman "The Elements of Statistical Learning" has a nice explanation, but I'm sure there are lots of good web resources, too. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
Hi I'm trying to get the top decision rules from a decision tree. Eventually I will like to do this with R and Random Forrest. There
has
to be a way to output the decsion rules of each leaf node in an easily readable way. I am looking at the randomforrest and rpart packages
and I
dont see anything yet.
Mike
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.