Random Forests: Question about R^2
I would like to summarize. Would you please confirm that my summary is correct? Thank you very much! Determining R^2 in Random Forests (for a Regression Forest): 1. For each individual case, record a mean prediction on the dependent variable y across all trees for which the case is OOB (Out-of-Bag); 2. For each individual case, calculate a residual: residual = observed y - mean predicted y (from step 1) 3. Calculate mean square residual MSE: MSE = sum of all individual residuals (from step 2) / n 4. Because MSE/var(y) represents the proportion of y variance that is due to error, then R^2 = 1 - MSE/var(y). If it's correct, my last question would be: I am getting as many R^2 as the number of trees because each time the residuals are recalculated using all trees built so far, correct? Thank you very much! Dimitri
On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
Apologies: that should have been sum(residual^2)!
-----Original Message----- From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com] Sent: Monday, April 13, 2009 4:35 PM To: Liaw, Andy Cc: R-Help List Subject: Re: [R] Random Forests: Question about R^2 Andy, thank you very much! One clarification question: If MSE = sum(residuals) / n, then in the formula (1 - mse / Var(y)) - shouldn't one square mse before dividing by variance? Dimitri On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy <andy_liaw at merck.com> wrote:
MSE is the mean squared residuals. ?For the training data, the OOB estimate is used (i.e., residual = data - OOB prediction, MSE = sum(residuals) / n, OOB prediction is the mean of
predictions from all
trees for which the case is OOB). ?It is _not_ the average
OOB MSE of
trees in the forest. I hope there's no question about how the pseudo R^2 is computed on a test set? ?If you understand how that's done, I assume the
confusion is
only how the OOB MSE is formed. Best, Andy From: Dimitri Liakhovitski
Dear Random Forests gurus, I have a question about R^2 provided by randomForest (for
regression).
I don't succeed in finding this information. In the help file for randomForest under "Value" it says: rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y). Could someone please explain in somewhat more detail how
exactly R^2
is calculated? Is "mse" mean squared error for prediction? Is "mse" an average of mse's for all trees run on out-of-bag holdout samples? In other words - is this R^2 based on out-of-bag samples? Thank you very much for clarification! -- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: ?This e-mail message, together with any
attachments, contains
information of Merck & Co., Inc. (One Merck Drive,
Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for
affiliates is
available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally
privileged. It is
intended solely for the use of the individual or entity
named on this
message. If you are not the intended recipient, and have
received this
message in error, please notify us immediately by reply e-mail and then delete it from your system.
-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
Notice: ?This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com