Skip to content
Prev 132442 / 398506 Next

Random forests

On Tue, 2007-12-18 at 16:27 -0600, Naiara Pinto wrote:
Hi Naiara,

I'm so not an expert here, but what you propose with mty = number of
predictors will give you a procedure known as bagging.

You talk about support for the split and then for the node. Is this just
a typo or are you interested in the two different things?

I'm not aware of how you do the latter in bagging or random forests as
the whole point is to grow large trees not pruned ones. As to the
former, trees are unstable, change the data used to train them just a
little and you can get a very different fitted tree.

Bagging and random forests exploit this to produce a better prediction
machine / classifier by using n poor trees rather than one best tree.
They do this by adding randomness to the procedure by bootstrap sampling
the training data, and in the case of random forest, randomly sampling a
small number, mtry, of available predictors to grow each tree. As such
there is no correspondence between the splits of one tree and the splits
of another, so trying to compare how many times a certain split in one
or more trees is formed by the same predictor. So it doesn't make sense
(to me it may to others) to focus on individual splits in the n trees.

I don't know what you mean exactly by "support", but if you are trying
to get a measure of how important each of your predictors is in
explaining variance in your response, then take a look at the
importance() function in the randomForest package. This produces a
couple of measures that allow you to determine which predictors
contribute most to reducing node impurity or MSE.

HTH

G