Skip to content
Prev 177053 / 398503 Next

Random Forests Variable Importance Question

Paul,

To build on what Andy said:
The measures of importance RF provides are just alternative ways of
getting at the same thing: Variable Importance.
For example, MeanDecreaseAccuracy is one of those alternatives. As
Andy said, it does not make sense to look at the absolute importance
value. In a hypothetical case where all importance values seem "high"
but are equal - that means that all variables have the same
importance. In another case, where all importance values seem "low"
but are equal - that means exactly the same thing, that all variables
have the same importance. The point is: the absolute value of
importance is not very helpful. One needs to build relative importance
values.
I learned to use it like this (similar to what Andy said):

1. Take RF output for each variable (MeanDecreaseAccuracy - for
example, if the RF object is called "rftest" then I take the vector
as.data.frame(rftest$importance)[1]
2. I divide each variable's (raw) importance by its respective SD
(as.data.frame(rftest$importanceSD)[1])
3. The resulting values that are less than zero are made equal to
zero, as Andy mentioned.
4. I take each value, multiply it by 100 and divide it by the sum of
all the values from step 3.

This way I get relative importance of each predictor and all
importances sum up to 100.