Skip to content
Prev 176911 / 398506 Next

Random Forests Variable Importance Question

I'll take a shot.

Let me try to explain the 3rd measure first.  A RF model tries to predict an outcome variable (the classes) from a group of potential predictor variables (the "x").  If a predictor variable is "important" in making the prediction accurate, then by messing with it (e.g., giving it random values) should have a larger impact on how well the prediction can be made, compared to a variable that contributes little.  The variable importance measure tries to capture this.  (If you throw a wrench into the trunk of a car, it probably doesn't affect how the car drives.  However, if you throw the wrench into the engine compartment, that _may_ be a different story.)

I don't know about others, but I only look at the relative importance of the variables, rather than trying to interpret the numbers (raw or scaled).  Any number below 0 should be treated as the same as 0 (if I recall, Breiman & Cutler's code truncate the values at 0).  Any variable with importance value smaller than the absolute value of the minimum is probably not worth much looking.

The first two measures (you must be predicting an outcome variable with two classes) are the analogous measures that address each of the two classes specifically, rather than over all of the data.

Andy


From: Paul Fisch
Notice:  This e-mail message, together with any attachme...{{dropped:12}}