Skip to content

How to deal with missing values when using Random Forrest

3 messages · kevin123, David Winsemius, Weidong Gu

#
I am using the package Random Forrest to test and train a model,
I aim to predict (LengthOfStay.days),:
+ importance=TRUE,
+ keep.forest=TRUE
+ )
 

*This is a small portion of the data frame:   *

*data(training)*

LengthOfStay.days CharlsonIndex.numeric DSFS.months
1                  0                   0.0         8.5
6                  0                   0.0         3.5
7                  0                   0.0         0.5
8                  0                   0.0         0.5
9                  0                   0.0         1.5
11                 0                   1.5         NaN



*Error message*

Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0,  : 
  missing values in object,

I would greatly appreciate any help

Thanks

Kevin


--
View this message in context: http://r.789695.n4.nabble.com/How-to-deal-with-missing-values-when-using-Random-Forrest-tp4421254p4421254.html
Sent from the R help mailing list archive at Nabble.com.
#
On Feb 25, 2012, at 6:24 PM, kevin123 wrote:

            
What part of that error message is unclear? Have you looked at the  
randomForest page? It tells you what the default behavior is na.fail.
I would seem that the way forward is to remove the cases with missing  
values or to impute values.
#
Hi,

You can set na.action=na.roughfix which fills NAs with the mean or
mode of the missing variable.

Other option is to impute missing values using rfImpute, then run
randomForest on the complete data set.

Weidong Gu
On Sat, Feb 25, 2012 at 6:24 PM, kevin123 <kevincorry123 at gmail.com> wrote: