r2 for lm() with zero intercept - R-help

Mon, Dec 1, 2008 5:53 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081202/e9119e8e/attachment.pl>

Berwin A Turlach

Mon, Dec 1, 2008 9:17 PM #

G'day Glenn,

On Tue, 2 Dec 2008 12:53:44 +1100

<Glenn.Newnham at csiro.au> wrote:

?summary.lm

In particular the part:

r.squared: R^2, the 'fraction of variance explained by the model',

              R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),

          where y* is the mean of y[i] if there is an intercept and
          zero otherwise.

Quite true; accept if there are very good reasons.  I have seen
intercept through the origin being misused to obtain a large R^2 and
significant coefficient when there were none.

Cheers,

	Berwin

=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba

Glenn.Newnham at csiro.au

Mon, Dec 1, 2008 10:02 PM #

Thanks Berwin
Obviously the code is functioning properly then, but do you consider this the best way of computing R^2 for a zero intercept? I just checked what excel and genstat do in this situation and the R^2 they come up with reduces for a zero intercept rather than increases. This seems more logical to me since fixing the intercept leads to a model that, at least in appearance, explains less of the variance in the data.

Cheers, Glenn

-----Original Message-----
From: Berwin A Turlach [mailto:berwin at maths.uwa.edu.au] 
Sent: Tuesday, 2 December 2008 4:17 PM
To: Newnham, Glenn (CSE, Clayton)
Cc: r-help at r-project.org
Subject: Re: [R] r2 for lm() with zero intercept

G'day Glenn,

On Tue, 2 Dec 2008 12:53:44 +1100

<Glenn.Newnham at csiro.au> wrote:

?summary.lm

In particular the part:

r.squared: R^2, the 'fraction of variance explained by the model',

              R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),

          where y* is the mean of y[i] if there is an intercept and
          zero otherwise.

Quite true; accept if there are very good reasons.  I have seen
intercept through the origin being misused to obtain a large R^2 and
significant coefficient when there were none.

Cheers,

	Berwin

=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba

Berwin A Turlach

Mon, Dec 1, 2008 11:08 PM #

G'day Glenn,

On Tue, 2 Dec 2008 17:02:26 +1100

<Glenn.Newnham at csiro.au> wrote:

The way R does.  What else would I say. ;-)

That formula compares the variance explained by the model to the
null-model in which all covariates are removed.

I am not surprised that excel does the wrong thing; just one more
example where it misbehaves.  But I am surprised to hear that genstat
does the same, I thought that genstat was developed by
statisticians....

If you use the formula for models with an intercept on models without
an intercept then it may happen that you end up with a negative R^2.
Try to explain that to a user; that a quantity that is called R-squared
is negative.

My only gripe is that if one does something like:

	fm <- lm(Y ~ A/x - 1, data)

where A is a factor, R will handle this model as a no intercept term.
Although it has implicitly an intercept and I just choose to use this
formula since I was interested in parameterisation implied by this
formula.

But, you can always be subversive and say something like

	attributes(fm$terms)$intercept <- 1

before calling summary(fm) if you want to use the formula for models
with intercept.  Of course, in future versions (and past if you use an
older one), it would be advisable to study summary.lm first to see if
this trick will work; in R 2.8.0 it does.

Cheers,

	Berwin