Prev 131913 / 398506 Next

lm/model.matrix confusion (? bug)

Mark Difford

Wed, Dec 12, 2007 6:46 AM

Hi Gavin, Berwin,

Thanks for your detailed replies.  I'll make a general reply, if you don't
mind.

To reiterate, my main point is that if model.matrix() can be used in this
way, then lm() shouldn't add an intercept.

And the documentation for lm() indicates. albeit indirectly, that
model.matrix can be used in this way.  It calls for a formula, or something
that can be coerced to one.  And the following meets that criterion:
as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)), and this
specifies no intercept.

On the question of why I want to mess about in such a labarynthine way. 
Well my email was largely expository.  With a straight call to lm(), I
wouldn't bother with model.matrix.

So, it really was about getting at lm coefficients inside a function (when
you have to get the terms &c. from somewhere else), and trying to understand
properly how things work, and why they work the way they do, and even if
they should work the way they do.

For instance:---

    if (ols) {
        obj <- x[[1]]
        mt <- terms(obj)
        mf <- model.frame(obj)
        y <- model.response(mf)
        X <- model.matrix(mt, mf, contrasts = obj$contrasts)
        if (attr(mt, "intercept") == 1)                    ## This is my
hack to overcome the double-intercept problem
            { olscf <- summary(lm(y ~ X))$coefficients }
        else {
            olscf <- summary(lm(y ~ X - 1))$coefficients
        }
        rownames(olscf) <- rownames(coef(obj))

Thanks again for your input.

Regards,
Mark.

Berwin A Turlach wrote:

G'day Mark,

On Wed, 12 Dec 2007 02:05:54 -0800 (PST)
Mark Difford <mark_difford at yahoo.co.uk> wrote:

In order to get the same coefficients as we get from the following

[...]

we need to do the following (if we use model.matrix to specify the
model)

By why would you want to do this?

##
summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data =
whiteside) )

That is, we need to take out "two intercepts."  Is this "correct"?

Yes.

Shouldn't lm check to see if an intercept has been requested as part
of the model formula?

No, it does not.  In the Details section of lm's help page you will
find the following:

     A formula has an implied intercept term.  To remove this use
     either 'y ~ x - 1' or 'y ~ 0 + x'.  See 'formula' for more details
     of allowed formulae.

Thus, except if you explicitly ask for a constant term not be included,
lm will add a constant term (a column of ones) additionally to what
ever you have specified on the right hand side of the formula.

If I do
##
summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1,
data=whiteside)), data=whiteside))

we get a strange model.

Well, you get a model in which not all parameters are identifiable, and
a particular parameter that is not identifiable is estimated by NA.  I
am not sure what is strange about this.

But the formula part of this model qualifies
as a valid formula
##
as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))

Debatable, the above command only shows that it can be coerced into a
valid formula. :)

just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside)

But we know that the _correct_ formula is the following

##
as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)

Why is this formula any more correct than the other one?  Both specify
exactly the same model.  It is just that one does it in an
overparameterised way.

(Sorry, this is getting really long) --- So, my question/confusion
comes down to wanting to know why lm() doesn't check to see if an
intercept has been specified when the model has been specified using
model.matrix.

Because lm() is documented not to check this.  If you do not want to
have an intercept in the model you have to specifically ask it for.

Also, comparing the output of 
	summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
and
	summary( lm(Gas ~ Insul/Temp, data = whiteside ) )

you can see that lm() does not check whether there is an implicit
intercept in the model.  Compare the (Adjusted) R-squared values
returned; one case is using the formula for models with no intercept
the other one the formula for models with intercept.  Similar story
with the reported F-statistics.  

Cheers,

	Berwin

=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability        +65 6515 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/lm-model.matrix-confusion-%28--bug%29-tp14292188p14296513.html
Sent from the R help mailing list archive at Nabble.com.

Thread (7 messages)

Mark Difford lm/model.matrix confusion (? bug) Dec 12 Mark Difford lm/model.matrix confusion (? bug) Dec 12 Berwin A Turlach lm/model.matrix confusion (? bug) Dec 12 Gavin Simpson lm/model.matrix confusion (? bug) Dec 12 Mark Difford lm/model.matrix confusion (? bug) Dec 12 Gavin Simpson lm/model.matrix confusion (? bug) Dec 12 Mark Difford lm/model.matrix confusion (? bug) Dec 12