Skip to content
Back to formatted view

Raw Message

Message-ID: <CAAjnpdjhFDAWrD9Pssewhss046vdFYdgPg7Crk7kYwoLqKBu6w@mail.gmail.com>
Date: 2019-01-12T17:55:30Z
From: Witold E Wolski
Subject: randomForest out of bag prediction

Hello,

I am just not sure what the predict.RandomForest function is doing...
I confused.

I would expect the predictions for these 2 function calls to predict the same:
```{r}
diachp.rf <- randomForest(quality~.,data=data,ntree=50, importance=TRUE)

ypred_oob <- predict(diachp.rf)
dataX <- data %>% select(-quality) # remove response.
ypred <- predict( diachp.rf, dataX )

ypred_oob == ypred
```
These are both out of bag predictions but ypred and ypred_oob are
actually they are very different.

> table(ypred_oob , data$quality)

ypred_oob    0    1
        0 1324  346
        1  493 2837
> table(ypred , data$quality)

ypred    0    1
    0 1817    0
    1    0 3183

What I find even more disturbing is that 100% accuracy for ypred.
Would you agree that this is rather unexpected?

regards
Witek
-- 
Witold Eryk Wolski