randomForest speed improvements
From: Liaw, Andy
Note that that isn't exactly what I recommended. If you look at the example in the help page for combine(), you'll see that it is combining RF objects trained on the same data; i.e., instead of having one RF with 500 trees, you can combine five RFs trained on the same data with 100 trees each into one 500-tree RF. The way you are using combine() is basically using sample size to limit tree size, which you can do by playing with the nodesize argument in randomForest() as I suggested previously. Either way is fine as long as you don't see prediction performance degrading.
I should also mention that another way you can do something similar is by making use of the sampsize argument in randomForest(). For example, if you call randomForest() with sampsize=500, it will randomly draw 500 data points to grow each tree. This way you don't even need to run the RFs separately and combine them. Andy
Andy
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of apresley Sent: Tuesday, January 04, 2011 6:30 PM To: r-help at r-project.org Subject: Re: [R] randomForest speed improvements Andy, Thanks for the reply. I had no idea I could combine them back ... that actually will work pretty well. We can have several "worker threads" load up the RF's on different machines and/or cores, and then re-assemble them. RMPI might be an option down the road, but would be a bit of overhead for us now. Using the method of combine() ... I was able to drastically
reduce the
amount of time to build randomForest objects. IE, using about 25,000 rows (6 columns), it takes maybe 5 minutes on my laptop. Using 5 randomForest objects (each with 5k rows), and then combining them, takes < 1 minute. -- Anthony -- View this message in context: http://r.789695.n4.nabble.com/randomForest-speed-improvements- tp3172523p3174621.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any
attachme...{{dropped:11}}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}