Skip to content
Prev 246367 / 398502 Next

randomForest speed improvements

If you have multiple cores, one "poor man's solution" is to run separate
forests in different R sessions, save the RF objects, load them into the
same session and combine() them.  You can do this less clumsily if you
use things like Rmpi or other distributed computing packages.

Another consideration is to increase nodesize (which reduces the sizes
of trees).  The problem with numeric predictors for tree-based
algorithms is that the number of computations to find the best splitting
point increases by that much _at each node_.  Some algorithms try to
save on this by using only certain quantiles.  The current RF code
doesn't do this.

Andy
Notice:  This e-mail message, together with any attachme...{{dropped:11}}