Random Forests: Predictor importance for Regression Trees
Hello! I think I am relatively clear on how predictor importance (the first one) is calculated by Random Forests for a Classification tree: Importance of predictor P1 when the response variable is categorical: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, subtract the number of votes for the correct class in the predictor-P1-permuted oob dataset from the number of votes for the correct class in the untouched oob dataset: if P1 is important, this number will be large. 3. The average of this number over all trees in the forest is the raw importance score for predictor P1. I am wondering what step 2 above looks like if the response variable is continous and not categorical, in other words - for a Regression tree. Could you please correct if what I wrote below is wrong? Thank you very much! Importance of predictor P1 when the response variable is continous: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, calculate mean squared deviation of observed y minus predicted y for (a) the untouched oob dataset and for (b) the predictor-P1-permuted oob dataset. Subtract (a) from (b). 3. The average of this number over all trees in the forest is the raw importance score for predictor P1.
Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com