Skip to content

Decision Tree and Random Forrest

8 messages · Michael Artz, Bert Gunter, Sarah Goslee +1 more

#
Hi I'm trying to get the top decision rules from a decision tree.
Eventually I will like to do this with R and Random Forrest.  There has to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike
#
Nope.

Random forests are not decision trees -- they are ensembles (forests)
of trees. You need to go back and read up on them so you understand
how they work. The Hastie/Tibshirani/Friedman "The Elements of
Statistical Learning" has a nice explanation, but I'm sure there are
lots of good web resources, too.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaeleartz at gmail.com> wrote:
#
Ok is there a way to do  it with decision tree?  I just need to make the
decision rules. Perhaps I can pick one of the trees used with Random
Forrest.  I am somewhat familiar already with Random Forrest with
respective to bagging and feature sampling and getting the mode from the
leaf nodes and it being an ensemble technique of many trees.  I am just
working from the perspective that I need decision rules, and I am working
backward form that, and I need to do it in R.
On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

            

  
  
#
Also that being said, just because random forest are not the same thing as
decision trees does not mean that you can't get decision rules from random
forest.

On Wed, Apr 13, 2016 at 4:11 PM, Michael Artz <michaeleartz at gmail.com>
wrote:

  
  
#
I think you are missing the point of random forests. But if you just
want to predict using the forest, there is a predict() method that you
can use. Other than that, I certainly don't understand what you mean.
Maybe someone else might.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaeleartz at gmail.com> wrote:
#
Ah yes I will have to use the predict function.  But the predict function
will not get me there really.  If I can take the example that I have a
model predicting whether or not I will play golf (this is the dependent
value), and there are three independent variables Humidity(High, Medium,
Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High,
Low).  I would like rules like where any record that follows these rules
(IF humidity = high AND pending_chores = None AND Wind = High THEN 77%
there is probability that play_golf is YES).  I was thinking that random
forrest would weight the rules somehow on the collection of trees and give
a probability.  But if that doesnt make sense, then can you just tell me
how to get the decsion rules with one tree and I will work from that.

Mike

Mike
On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

            

  
  
#
It sounds like you want classification or regression trees. rpart does
exactly what you describe.

Here's an overview:
http://www.statmethods.net/advstats/cart.html

But there are a lot of other ways to do the same thing in R, for instance:
http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/

You can get the same kind of information from random forests, but it's
less straightforward. If you want a clear set of rules as in your golf
example, then you need rpart or similar.

Sarah
On Wed, Apr 13, 2016 at 6:02 PM, Michael Artz <michaeleartz at gmail.com> wrote:
#
On Thu, 14 Apr 2016, Michael Artz wrote:

            
Although I think that this toy example is not overly useful for practical 
illustrations we have included the standard dataset in the "partykit" 
package:

## data
data("WeatherPlay", package = "partykit")
Then you can learn one tree on this data, e.g., with rpart() or ctree():

## trees
library("rpart")
rp <- rpart(play ~ ., data = WeatherPlay,
   control = rpart.control(minsplit = 5))

library("partykit")
ct <- ctree(play ~ ., data = WeatherPlay,
   minsplit = 5, mincriterion = 0.1)

## visualize via partykit
pr <- as.party(rp)
plot(pr)
plot(ct)

And the partykit package also includes a function to generate a text 
representation of the rules although this is currently not exported:

partykit:::.list.rules.party(pr)
##                            "outlook %in% c(\"overcast\")"
##                                                         4
##  "outlook %in% c(\"sunny\", \"rainy\") & humidity < 82.5"
##                                                         5
## "outlook %in% c(\"sunny\", \"rainy\") & humidity >= 82.5"

partykit:::.list.rules.party(ct)
##                2                3
## "humidity <= 80"  "humidity > 80"

If you do not want a text representation but something else you can 
compute on, then look at the source code of partykit:::.list.rules.party() 
and try to adapt it to your needs.