Question about randomForest
Hi Matthew, The error rate reported by randomForest is the prediction error based on out-of-bag OOB data. Therefore, it is different from prediction error on the original data since each tree was built using bootstrap samples (about 70% of the original data), and the error rate of OOB is likely higher than the prediction error of the original data as you observed. Weidong On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis
<mattjamesfrancis at gmail.com> wrote:
I've been using the R package randomForest but there is an aspect I cannot work out the meaning of. After calling the randomForest function, the returned object contains an element called prediction, which is the prediction obtained using all the trees (at least that's my understanding). I've checked that this prediction set has the error rate as reported by err.rate. However, if I send the training data back into the the predict.randomForest function I find I get a different result to the stored set of predictions. This is true for both classification and regression. I find the predictions obtained this way also have a much lower error rate and perform very well (suspiciously well...) on measures such as AUC. My understanding is that the two predictions above should be the same. Since they are not, I must be not understanding something properly. Any ideas what's going on?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.