Skip to content

Questions on RandomForest

1 message · Liaw, Andy

#
Fucang,

Questions like these that are specific to one package are best addressed
directly to the package maintainer(s) first (me in this case), as the
discussion is unlikely to be of general interest to the whole list.

1.  The contituent classifier in randomForest uses the CART algorithm
(suitably modified for randomForest), based on Leo Breiman's Fortran code.
I believe the gut of rpart is written in C by Terry Therneau.

2.  There's no built-in functionality for randomForest (or most other
algorithms, for that matter) to detect "outliers".

3.  The predict() function will need to have the entire forest in memory, in
addition to the test set data.  There's nothing wrong with predicting the
test set in pieces.  I routinely do predictions on test sets with > 800,000
cases, but in pieces of sizes 10,000-50,000.

HTH,
Andy
------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}