Skip to content

randomForests predict problem

4 messages · Torsten Hothorn, Yves Brostaux

#
Hello everybody,

I'm testing the randomForest package in order to do some simulations and I 
get some trouble with the prediction of new values. The random forest 
computation is fine but each time I try to predict values with the newly 
created object, I get an error message. I thought I was because NA values 
in the dataframe, but I cleaned them and still got the same error. What am 
I doing wrong ?

 > library(mlbench)
 > library(randomForest)
 > data(Soybean)
 > test <- sample(1:683, 150, replace=F)
 > sb.rf <- randomForest(Class~., data=Soybean[-test,])
 > sb.rf.pred <- predict(sb.rf, Soybean[test,])
Error in matrix(t1$countts, nr = nclass, nc = ntest) :
         No data to replace in matrix(...)

I did it the same way with rpart and all worked fine :
 > library(rpart)
 > sb.rp <- rpart(Class~., data=Soybean[-test,])
 > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class")

Thank you all for any advice you can give to me.
#
try

R> test <- sample(1:683, 150, replace=FALSE)
R>
R> st <- Soybean[test,]
R>
R> sb.rf <- randomForest(Class~., data=Soybean, subset=-test)
R> sb.rf.pred <- predict(sb.rf, data=st)
R>
R> sb.rf.pred[1:10]
 [1] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker
 [4] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker
 [7] diaporthe-stem-canker charcoal-rot          charcoal-rot
[10] charcoal-rot
19 Levels: 2-4-d-injury alternarialeaf-spot anthracnose ...
rhizoctonia-root-rot


Torsten
#
Well, thank you for your answer, but this is not doing the right thing, 
that is predicting the Class value for the test set Soybean[test,]. It 
gives instead prediction for data used for forest computation (ignoring all 
data with NA's) ; 'data' argument is simply ignored as the right name for 
this argument is 'newdata', which still gives the same error when named.

 > length(sb.rf.pred)
[1] 445
 > dim(Soybean[test,])
[1] 150  36
 > dim(Soybean[-test,])
[1] 533  36
 > sb.rf.pred <- predict(sb.rf, newdata=st)
Error in matrix(t1$countts, nr = nclass, nc = ntest) :
         No data to replace in matrix(...)
At 13:13 02/04/03, you wrote:

            
#
oups, sorry, 'newdata' needs to be specified:

R> library(mlbench)
R> library(randomForest)
R> data(Soybean)
R>
R> test <- sample(1:683, 150, replace=FALSE)
R>
R> sl <- Soybean[-test,]
R> st <- Soybean[test,]
R>
R> sb.rf <- randomForest(Class~., data=Soybean, subset=-test)
R> st <- st[complete.cases(st),]
R> dim(st)
[1] 115  36
R> sb.rf.pred <- predict(sb.rf, newdata=st)
R> length(sb.rf.pred)
[1] 115
R> sb.rf.pred[1:10]
 [1] brown-spot          alternarialeaf-spot brown-spot
 [4] alternarialeaf-spot brown-spot          anthracnose
 [7] bacterial-pustule   bacterial-blight    alternarialeaf-spot
[10] frog-eye-leaf-spot
19 Levels: 2-4-d-injury alternarialeaf-spot anthracnose ...
rhizoctonia-root-rot

looks like an NA problem, anyway.

Torsten