Skip to content

handling ca

2 messages · Griffith.Michael at epamail.epa.gov, Gavin Simpson

#
In trying to use randomForest, I got the following error message:

Error in data.matrix(x) : non-numeric data type in frame

Am I correct that this means that randomForest  has not be written in R
to handle categorical predictor variables?  Is there a way around this?
I am working with two categorical variables (out of 4 predictor
variables) with more than 2 levels that do not have any particular order
to them.  The instructions for the original random forest program by
Brieman indicates that it handles categorical predictor variables, so I
am surprised that the R version does not.

Michael

Michael B. Griffith, Ph.D.
Research Ecologist

USEPA, NCEA (MS A-110)
26 W. Martin Luther King Dr.
Cincinnati, OH  45268

telephone:  513 569-7034
e-mail:  griffith.michael at epa.gov
#
On Wed, 2008-08-06 at 13:51 -0400, Griffith.Michael at epamail.epa.gov
wrote:
No, randomForest handles factors, as this simple example shows:
X1         X2         X3         X4          X5         X6
1 -0.15048037  1.2497460 -0.7728316 -0.3286552  1.59056488 -1.2579715
2 -0.67688208 -2.0189794 -0.3154595  0.5998583 -1.89438803 -0.9737503
3  1.02637837  0.3724476 -0.3145720  1.4510331  1.78757305  0.4365752
4  0.08031081  0.6534088 -0.6211070  0.1432012 -0.51041876 -1.0198103
5  0.09208803  0.6273971  0.7333440  0.4362220 -0.03848859  0.6260701
6 -1.41415813 -1.1515418 -0.7457416  1.5853533 -1.17111942  2.5486069
          X7         X8         X9        X10 fac
1  0.7698208 -1.8697214 -1.1568065  0.8459625   1
2 -0.2782257  0.1361337 -1.1308822  0.6001056   1
3 -1.0053869  0.5940746 -0.1833341  2.0251286   1
4 -0.9806460 -1.5225105 -1.8038346  0.2879445   1
5 -0.3767947 -1.8172355  1.1956810  1.2158483   1
6 -0.9316282  2.1180183 -0.6357269 -1.3134966   1
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [75] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4
As all you give is an error message we have very little to go on, but
the first thing I would check is that your data is as you think it
should be. factors can be converted to characters upon reading in data
to R, so that'd be my first port of call. What does:

str(mydata)

return, where mydata is the object that is your data.

Note the error comes from data.matrix. If you consult the help page for
that function you would see that this is the preferred way of converting
from a data frame to a matrix which preserves the numeric representation
of factors. Clearly there is something that is not a factor or numeric
in your data otherwise this standard R function would not have been
giving and error.

HTH

G