In trying to use randomForest, I got the following error message: Error in data.matrix(x) : non-numeric data type in frame Am I correct that this means that randomForest has not be written in R to handle categorical predictor variables? Is there a way around this? I am working with two categorical variables (out of 4 predictor variables) with more than 2 levels that do not have any particular order to them. The instructions for the original random forest program by Brieman indicates that it handles categorical predictor variables, so I am surprised that the R version does not. Michael Michael B. Griffith, Ph.D. Research Ecologist USEPA, NCEA (MS A-110) 26 W. Martin Luther King Dr. Cincinnati, OH 45268 telephone: 513 569-7034 e-mail: griffith.michael at epa.gov
handling ca
2 messages · Griffith.Michael at epamail.epa.gov, Gavin Simpson
On Wed, 2008-08-06 at 13:51 -0400, Griffith.Michael at epamail.epa.gov wrote:
In trying to use randomForest, I got the following error message: Error in data.matrix(x) : non-numeric data type in frame Am I correct that this means that randomForest has not be written in R to handle categorical predictor variables? Is there a way around this? I am working with two categorical variables (out of 4 predictor variables) with more than 2 levels that do not have any particular order to them. The instructions for the original random forest program by Brieman indicates that it handles categorical predictor variables, so I am surprised that the R version does not.
No, randomForest handles factors, as this simple example shows:
dat <- data.frame(matrix(rnorm(1000), ncol = 10)) dat$fac <- gl(4,25) head(dat)
X1 X2 X3 X4 X5 X6
1 -0.15048037 1.2497460 -0.7728316 -0.3286552 1.59056488 -1.2579715
2 -0.67688208 -2.0189794 -0.3154595 0.5998583 -1.89438803 -0.9737503
3 1.02637837 0.3724476 -0.3145720 1.4510331 1.78757305 0.4365752
4 0.08031081 0.6534088 -0.6211070 0.1432012 -0.51041876 -1.0198103
5 0.09208803 0.6273971 0.7333440 0.4362220 -0.03848859 0.6260701
6 -1.41415813 -1.1515418 -0.7457416 1.5853533 -1.17111942 2.5486069
X7 X8 X9 X10 fac
1 0.7698208 -1.8697214 -1.1568065 0.8459625 1
2 -0.2782257 0.1361337 -1.1308822 0.6001056 1
3 -1.0053869 0.5940746 -0.1833341 2.0251286 1
4 -0.9806460 -1.5225105 -1.8038346 0.2879445 1
5 -0.3767947 -1.8172355 1.1956810 1.2158483 1
6 -0.9316282 2.1180183 -0.6357269 -1.3134966 1
dat$fac
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [75] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Levels: 1 2 3 4
dat$resp <- rpois(100, 2) dat$resp <- rnorm(100, 2) forest <- randomForest(resp ~ ., data = dat)
As all you give is an error message we have very little to go on, but the first thing I would check is that your data is as you think it should be. factors can be converted to characters upon reading in data to R, so that'd be my first port of call. What does: str(mydata) return, where mydata is the object that is your data. Note the error comes from data.matrix. If you consult the help page for that function you would see that this is the preferred way of converting from a data frame to a matrix which preserves the numeric representation of factors. Clearly there is something that is not a factor or numeric in your data otherwise this standard R function would not have been giving and error. HTH G
Michael Michael B. Griffith, Ph.D. Research Ecologist USEPA, NCEA (MS A-110) 26 W. Martin Luther King Dr. Cincinnati, OH 45268 telephone: 513 569-7034 e-mail: griffith.michael at epa.gov
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%