An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081202/e9119e8e/attachment.pl>
r2 for lm() with zero intercept
4 messages · Glenn.Newnham at csiro.au, Berwin A Turlach
G'day Glenn, On Tue, 2 Dec 2008 12:53:44 +1100
<Glenn.Newnham at csiro.au> wrote:
I'm a little confused about the R2 and adjusted R2 values reported by lm() when I try to fix an intercept. When using +0 or -1 in the formula I have found that the standard error generally increases (as I would expect) but the R2 also increases (which seems counter intuitive).
?summary.lm
In particular the part:
r.squared: R^2, the 'fraction of variance explained by the model',
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
where y* is the mean of y[i] if there is an intercept and
zero otherwise.
I do realise that many will say I shouldn't be fixing the intercept anyway
Quite true; accept if there are very good reasons. I have seen intercept through the origin being misused to obtain a large R^2 and significant coefficient when there were none. Cheers, Berwin =========================== Full address ============================= Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba
Thanks Berwin Obviously the code is functioning properly then, but do you consider this the best way of computing R^2 for a zero intercept? I just checked what excel and genstat do in this situation and the R^2 they come up with reduces for a zero intercept rather than increases. This seems more logical to me since fixing the intercept leads to a model that, at least in appearance, explains less of the variance in the data. Cheers, Glenn -----Original Message----- From: Berwin A Turlach [mailto:berwin at maths.uwa.edu.au] Sent: Tuesday, 2 December 2008 4:17 PM To: Newnham, Glenn (CSE, Clayton) Cc: r-help at r-project.org Subject: Re: [R] r2 for lm() with zero intercept G'day Glenn, On Tue, 2 Dec 2008 12:53:44 +1100
<Glenn.Newnham at csiro.au> wrote:
I'm a little confused about the R2 and adjusted R2 values reported by lm() when I try to fix an intercept. When using +0 or -1 in the formula I have found that the standard error generally increases (as I would expect) but the R2 also increases (which seems counter intuitive).
?summary.lm
In particular the part:
r.squared: R^2, the 'fraction of variance explained by the model',
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
where y* is the mean of y[i] if there is an intercept and
zero otherwise.
I do realise that many will say I shouldn't be fixing the intercept anyway
Quite true; accept if there are very good reasons. I have seen intercept through the origin being misused to obtain a large R^2 and significant coefficient when there were none. Cheers, Berwin =========================== Full address ============================= Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba
G'day Glenn, On Tue, 2 Dec 2008 17:02:26 +1100
<Glenn.Newnham at csiro.au> wrote:
Obviously the code is functioning properly then, but do you consider this the best way of computing R^2 for a zero intercept?
The way R does. What else would I say. ;-) That formula compares the variance explained by the model to the null-model in which all covariates are removed.
I just checked what excel and genstat do in this situation and the R^2 they come up with reduces for a zero intercept rather than increases.
I am not surprised that excel does the wrong thing; just one more example where it misbehaves. But I am surprised to hear that genstat does the same, I thought that genstat was developed by statisticians....
This seems more logical to me since fixing the intercept leads to a model that, at least in appearance, explains less of the variance in the data.
If you use the formula for models with an intercept on models without an intercept then it may happen that you end up with a negative R^2. Try to explain that to a user; that a quantity that is called R-squared is negative. My only gripe is that if one does something like: fm <- lm(Y ~ A/x - 1, data) where A is a factor, R will handle this model as a no intercept term. Although it has implicitly an intercept and I just choose to use this formula since I was interested in parameterisation implied by this formula. But, you can always be subversive and say something like attributes(fm$terms)$intercept <- 1 before calling summary(fm) if you want to use the formula for models with intercept. Of course, in future versions (and past if you use an older one), it would be advisable to study summary.lm first to see if this trick will work; in R 2.8.0 it does. Cheers, Berwin