Skip to content

Random Forest Reading N/A's, I don't see them

10 messages · michael.weylandt at gmail.com (R. Michael Weylandt, Lost in R, jim holtman +2 more

#
After checking the original data in Excel for blanks and running Summary(cm3)
to identify any null values in my data, I'm unable to identify an instances.
Yet when I attempted to use the data in Random Forest, I get the following
error. Is there something that Random Forest is reading as null which is not
actually null? Is there a better way to check for this?
+ rf1 <- randomForest(as.matrix(cm3[,c(2:length(colnames(cm3)))]),
+ cm3[,1],data=cm3,ntree=50)
+ )
*Error in randomForest.default(as.matrix(cm3[, c(2:length(colnames(cm3)))]), 
: 
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
Timing stopped at: 1.33 0.01 1.35 *


Thanks in advance,
Mike

--
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4201546.html
Sent from the R help mailing list archive at Nabble.com.
Use str() on your object and attach the result. For even faster help, use dput() on a *small* sample of your data to make the problem reproducible. 

My guess is that there are characters or, less likely, factors lurking about...

Michael
On Dec 15, 2011, at 2:39 PM, Lost in R <michael.hartye at principiscapital.com> wrote:

            
#
Thanks Michael -  That was a help, i got rid of the "," in my numbers and the
"%" which were making many of the numeric variables FACTORS. It appears that
I made all of the those revisions, but still getting the same error.
Attached is the str() output if anyone could shed some light it would be
much appreciated.



Thanks,
Mike

http://r.789695.n4.nabble.com/file/n4201899/Str%28%29.docx Str%28%29.docx 

--
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4201899.html
Sent from the R help mailing list archive at Nabble.com.
#
I've also attached here a sample of my data in Excel. I'm thinking it must be
a problem with a character, but can't figure it out. Is there a list
somewhere of characters to avoid in R?

Thanks,
Mike

http://r.789695.n4.nabble.com/file/n4205479/Sample_Data_Set.csv
Sample_Data_Set.csv 

--
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4205479.html
Sent from the R help mailing list archive at Nabble.com.
#
What exactly is your problem with this file?  The file that you sent
had 10 lines of what appeared to be data and 4489 lines with just
commas which would read in as NAs.  When you do an 'str' you get:
'data.frame':   4498 obs. of  195 variables:
 $ Good_Bad                   : Factor w/ 3 levels "","BAD","GOOD": 3
3 3 3 2 2 2 3 3 1 ...
 $ Good1Bad0                  : int  1 1 1 1 0 0 0 1 1 NA ...
 $ PercUltColl                : num  1 1 1 0.98 0.09 0.01 0.19 1 1 NA ...
 $ GoodMerchant.              : int  1 1 1 1 0 0 0 1 1 NA ...
 $ Fundid

so there are 4498 lines of data in the file, but you probably only
what the first 10.  Is this what your problem is?

On Fri, Dec 16, 2011 at 12:20 PM, Lost in R
<michael.hartye at principiscapital.com> wrote:

  
    
#
On Dec 16, 2011, at 12:20 PM, Lost in R wrote:

            
It? What is "it"?
We are not looking at this with Nabble. This is a mailing list. You  
are asked to attach context. That is something that can be done easily  
done in Nabble, so your failure to do so is seen by most viewers of  
this list as one of:

cause <-  c("privileged attitude", "clueless about mailing lists",  
"persistent failure to read Posting Guide")
David Winsemius, MD
West Hartford, CT
#
On Dec 15, 2011, at 2:39 PM, Lost in R wrote:

            
# Are you aware of the effect of using as.matrix(..) on the storage  
mode?
# that was the x argument
# The y variable
# That's odd. You already offered the data objects.  I wonder what the  
function will do with that?
I can see two potential sources of such an error.
David Winsemius, MD
West Hartford, CT
#
Try randomForest with a small dataset to see how it works:
  > d <- data.frame(stringsAsFactors=FALSE,
  +                 Num=(1:10)%%9,
  +                 Fac=factor(rep(LETTERS[1:2],each=5)),
  +                 Char=rep(letters[24:26],len=10))
  > randomForest(x=d[,"Char",drop=FALSE], y=d$Num)
  Error in randomForest.default(x = d[, "Char", drop = FALSE], y = d$Num) : 
    NA/NaN/Inf in foreign function call (arg 1)
  In addition: Warning message:
  In data.matrix(x) : NAs introduced by coercion
  > randomForest(x=d[,"Fac",drop=FALSE], y=d$Num)

  Call:
   randomForest(x = d[, "Fac", drop = FALSE], y = d$Num) 
                 Type of random forest: regression
                       Number of trees: 500
  No. of variables tried at each split: 1

            Mean of squared residuals: 9.573558
                      % Var explained: -40.58

It appears to die if any predictors are character vectors:
it will not convert them to factors (as most modelling functions
do).

as.matrix(data.frame) creates a character matrix if not all columns
are numeric or logical, so I suspect you are running into the
no-character-data limitation.  Try leaving off the as.matrix and
pass in the data.frame that it expects:
   randomForest(x=cm3[,-1,drop=FALSE], y=cm3[,1])
(The is no need or use for the data= argument if you use the x=,y=
interface.  It is only there for the formula interface.)

If you dislike the no-character-data limitation discuss it with
the person at the address given by maintainer("randomForest").

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
3 days later