Skip to content

Question about randomForest

2 messages · Matthew Francis, Weidong Gu

#
I've been using the R package randomForest but there is an aspect I
cannot work out the meaning of. After calling the randomForest
function, the returned object contains an element called prediction,
which is the prediction obtained using all the trees (at least that's
my understanding). I've checked that this prediction set has the error
rate as reported by err.rate.

However, if I send the training data back into the the
predict.randomForest function I find I get a different result to the
stored set of predictions. This is true for both classification and
regression. I find the predictions obtained this way also have a much
lower error rate and perform very well (suspiciously well...) on
measures such as AUC.

My understanding is that the two predictions above should be the same.
Since they are not, I must be not understanding something properly.
Any ideas what's going on?
#
Hi Matthew,

The error rate reported by randomForest is the prediction error based
on out-of-bag OOB data. Therefore, it is different from prediction
error on the original data  since each tree was built using bootstrap
samples (about 70% of the original data), and the error rate of OOB is
likely higher than the prediction error of the original data as you
observed.

Weidong

On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis
<mattjamesfrancis at gmail.com> wrote: