I would like to summarize. Would you please confirm that my summary is
correct? Thank you very much!
Determining R^2 in Random Forests (for a Regression Forest):
1. For each individual case, record a mean prediction on the dependent
variable y across all trees for which the case is OOB (Out-of-Bag);
2. For each individual case, calculate a residual: residual = observed
y - mean predicted y (from step 1)
3. Calculate mean square residual MSE: MSE = sum of all individual
residuals (from step 2) / n
4. Because MSE/var(y) represents the proportion of y variance that is
due to error, then R^2 = 1 - MSE/var(y).
If it's correct, my last question would be:
I am getting as many R^2 as the number of trees because each time the
residuals are recalculated using all trees built so far, correct?
Thank you very much!
Dimitri
On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy
<andy_liaw at merck.com> wrote:
Apologies: that should have been sum(residual^2)!
-----Original Message-----
From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com]
Sent: Monday, April 13, 2009 4:35 PM
To: Liaw, Andy
Cc: R-Help List
Subject: Re: [R] Random Forests: Question about R^2
Andy,
thank you very much!
One clarification question:
If MSE = sum(residuals) / n, then
in the formula (1 - mse / Var(y)) - shouldn't one square mse before
dividing by variance?
Dimitri
On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy
<andy_liaw at merck.com> wrote:
MSE is the mean squared residuals. ?For the training
estimate is used (i.e., residual = data - OOB prediction, MSE =
sum(residuals) / n, OOB prediction is the mean of
trees for which the case is OOB). ?It is _not_ the average
trees in the forest.
I hope there's no question about how the pseudo R^2 is
test set? ?If you understand how that's done, I assume the
only how the OOB MSE is formed.
Best,
Andy
From: Dimitri Liakhovitski
Dear Random Forests gurus,
I have a question about R^2 provided by randomForest (for
I don't succeed in finding this information.
In the help file for randomForest under "Value" it says:
rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y).
Could someone please explain in somewhat more detail how
is calculated?
Is "mse" mean squared error for prediction?
Is "mse" an average of mse's for all trees run on out-of-bag
holdout samples?
In other words - is this R^2 based on out-of-bag samples?
Thank you very much for clarification!
--
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com