Skip to content

Random Forest, Variable Mismatch

2 messages · Lorenzo Isella, Peter Langfelder

#
Dear All,
I am a bit puzzled.
I am developing a random forest model.
The data is large and it involves hundred of predictors, but the code I  
have written is relatively simple.
After training my random forest model, I apply it on some new data set to  
carry out some prediction, as you can see below


response_validation <- predict(rf,newdata=mydata,
                                type="response")

but I get this error message

Error in predict.randomForest(rf, newdata = mydata, type = "response") :
   variables in the training data missing in newdata

I am confused because I checked that there is no missing data neither in  
my training nor in my test data sets and the data types of the columns of  
both the test and train data sets are perfectly identical.
Bottom line: I have no idea about how to debug this (it is almost as if  
the error message should not exist).
Any suggestion is welcome.
Cheers

Lorenzo
#
On Sat, Feb 15, 2014 at 8:43 AM, Lorenzo Isella
<lorenzo.isella at gmail.com> wrote:
This error is thrown when the column names in original and new data do
not agree. Make sure the column names in your original data and the
new data 'mydata' are the same.
column types are not enough - the column names must be the same.

HTH,

Peter