error in random forest
Thank you very much. I'll jump in to the data and verify the consistency between the training and testing variables and their levels.
On Fri, Mar 7, 2008 at 5:14 PM, <Bill.Venables at csiro.au> wrote:
The error message is pretty clear, really. To spell it out a bit more, what you have done is as follows. Your training set has factor variables in it. Suppose one of them is "f". In the training set it has 5 levels, say. Your test set also has a factor "f", as it must, but it appears that in the test set it has 6 levels, or more, or levels that do not agree with those for "f" in the training set. This mismatch measn that the predict method for randomForest cannot use this test set. What you have to do is make sure that the factor levels agree for every factor in both test and training set. One way to do this is to put the test and training set together with rbind(...) say, and then separate them again. But even this will still have a problem for you. Because you training set will have some factor levels empty, which are not empty in the test set. The error will most likely be more subtle, though. You really need to sort this out yourself. It is not particularly an R problem, but a confusion over data. To be useful, your training set need to cover the field for all levels of every factor. Think about it. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Nagu Sent: Saturday, 8 March 2008 5:37 AM To: r-help at r-project.org; r-help at stat.math.ethz.ch Subject: [R] error in random forest Hi, I get the following error when I try to predict the probabilities of a test sample: Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = "prob") : New factor levels not present in the training data I have about 630 predictor variables in the dataset x.OM (25 factor variables and the remaining are continuous variables). Any ideas on how to trace it? Thank you, Nagu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.