Skip to content
Prev 41015 / 398498 Next

RandomForest & memory demand

If you have the test set, and don't need the forest for predicting other
data, you can give both training data and test data to randomForest() at the
same time (if that fits in memory).  This way there will only be one tree
kept in memory.  E.g., you would do something like:

my.result <- randomForest(x, y, xtest)

Then my.result$test will contain a list of results on the test set.  If you
also give ytest, there will be a bit more output.

If you follow Torsten's suggestion, you can use the combine() function to
merge the five forests into one.
The current implementation of the code requires (assuming classification, no
test data, and proximity=FALSE) approximately:

At R level:
- One copy of the training data.
- 6*(2n+1)*ntree integers for storing the forest.

At C level (dynamically allocated):
- (2n + 37)*nclass + 9*n + p*(2+nclass) doubles.
- 5 + (3*p + 22)*n + 5*(p + nclass) integers.

(nclass is the number of classes, n the number of cases in training data, p
the number of variables.)

HTH,
Andy