Skip to content
Prev 294888 / 398502 Next

caret: Error when using rpart and CV != LOOCV

Dominik,

There are a number of formulations of this statistic (see the
Kv?lseth[*] reference below).

I tend to think of R^2 as the proportion of variance explained by the
model[**]. With the "traditional" formula, it is possible to get
negative proportions (if there are extreme outliers in the
predictions, the negative proportion can be very large). I used this
formulation because it is always on (0, 1). It is called "R^2" after
all!

Here is an example:
[1] 0.9887525
+                          pred = simPredicted))
      RMSE   Rsquared
0.09538273 0.98860908
[1] -0.6884905
[1] 0.3669257
+                          pred = simPredicted))
     RMSE  Rsquared
 1.066900 -0.425169

It is somewhat extreme, but it does happen.

Max


* Kv?lseth, T. (1985). Cautionary note about $R^2$. American
statistician, 39(4), 279?285.
* This is a very controversial statement when non-linear models are
used. I'd rather use RMSE, but many scientists I work with still think
in terms of R^2 regardless of the model. The randomForest function
also computes this statistic, but calls it "% Var explained" instead
of explicitly labeling it as "R^2". This statistic has generated
heated debates and I hope that I will not have to wear a scarlet R in
Nashville in a few weeks.
On Thu, May 17, 2012 at 1:35 PM, Dominik Bruhn <dominik at dbruhn.de> wrote: