randomForest speed improvements
Have you tried adjusting:
mtry - the number of parameters to try per tree
ntree - the number of trees grown
keep.forest - logical on whether to store tree
Specifically, I found huge improvements in speed by switching keep.forest
to FALSE in the past when I didn't actually need the forest post analysis.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
the thing itself have purpose? Or do we, what's the word... imbue it."
- Jubal Early, Firefly
r-help-bounces at r-project.org wrote on 01/03/2011 02:59:29 PM:
[image removed] [R] randomForest speed improvements apresley to: r-help 01/03/2011 03:03 PM Sent by: r-help-bounces at r-project.org Hi there, We're trying to use randomForest to do some predictions. The
test-harness
for our code is pretty straightforward:
library ('randomForest');
data202 <- read.csv ("random.csv", header=TRUE);
x<- data202[1:50000,1:6];
y<- data202[1:50000,8];
y<- y[,drop=TRUE];
x2 <- data202[50001:60000,1:6];
y2 <- data202[50001:60000,8];
y2 <- y2[,drop=TRUE];
RFobject <- randomForest(x,y,na.action=na.roughfix);
p <- predict (RFobject, x2);
In this case, the CSV contains 10 columns, of which 1-6 are numeric in
nature (day of week, week of month, etc...) and column 8 is the target
(sales, a numeric number).
randomForest does fine with the data, our issue is how long it takes. In
this case, about 5,000 rows of data seems to take just a few seconds,
but
going to 50,000 rows doesn't take 5x the time, it takes perhaps 30 or 40 minutes. We've downloaded and tried RT-Rank, which is a multi-threaded version of RandomForest, and this seems to produce the same (or slightly better) predictions, but also gets done fairly quickly. What can we do to improve the speed of this data computation? The
system
we're on is a dual quad-core Intel CPU @ 2.33Ghz, and with 16GB of RAM
...
we're using the "stock" R RPM for CentOS 5.5. Thanks! -- Anthony -- View this message in context: http://r.789695.n4.nabble.com/ randomForest-speed-improvements-tp3172523p3172523.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.