randomForest() for regression produces offset predictions

I would expect this regression towards the mean behavior on a new or hold out
dataset, not on the training data. In RF terminology, this means that the
model prediction from predict is the in-bag estimate, but the out-of-bag
estimate is what you want for prediction. In Joshua's example,
rf.rf$predicted is an out-of-bag estimate, but since newdata is given, it
appears that the result is the in-bag estimate, which still needs an
adjustment like Joshua's  (and perhaps a more complex one might be needed in
some cases). This is a bit confusing since predict() usually matches what's
in model$fitted.values. I imagine that's why the author used "predicted" as
the component name instead of the standard "fitted.values".

The documentation for predict.randomForest explains:

"newdata - a data frame or matrix containing new data. (Note: If not given,
the out-of-bag prediction in object is returned. "

randomForest() for regression produces offset predictions

Thread (3 messages)