lm/model.matrix confusion (? bug)
On Wed, 2007-12-12 at 02:05 -0800, Mark Difford wrote:
Dear List-members, Hopefully someone will help through my confusion: In order to get the same coefficients as we get from the following ## require (MASS) summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) ...................... we need to do the following (if we use model.matrix to specify the model) ## summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) ) ...................... That is, we need to take out "two intercepts." Is this "correct"?
Yes - if you insist on doing things that way
Shouldn't lm check to see if an intercept has been requested as part of the model formula?
It does, (i.e. whether the user specified -1 in the formula argument), but you specified it inside model.matrix() so the formula parsing code never sees it. One wouldn't want lm to need to know about all functions that have/could ever be written in R, past or future, to know how to divine that here you wanted "-1" to mean "remove intercept please Mr. R Parser" and not "subtract 1 from this result please Mr. R Parser". You are really abusing the reason for having the formula interface. The whole point of it is to make it easy for user to specify a model, from which R generates the model matrix for you. Why use the formula at all if you have already produced your model matrix? See ?lm.fit for a fast way of fitting linear models (used within lm() ) if you have all the bits in place (i.e. you already have a model matrix) and are happy to take care of other details yourself. <snip />
(Sorry, this is getting really long) --- So, my question/confusion comes down to wanting to know why lm() doesn't check to see if an intercept has been specified when the model has been specified using model.matrix.
I guess because you can't programme around every possible usage that a user might dream up to try. If it did, lm would be an absolute pig and run like one. It is slow enough as it is with all the user-friendly stuff in there to help. (By slow enough, I mean it is perfectly speedy in routine use, but you wouldn't want to use it in a permutation test-like environment if you were after speed - you'd be better off working with lm.fit directly) The model.matrix bit is returning a matrix (without an entry for an intercept), which is fine on the rhs of a formula. As all you've done here is (effectively) create the following model:
form <- formula(paste("Gas ~",
+ paste(colnames(model.matrix (~ Insul/Temp-1, + data=whiteside)), collapse = " + ")))
form
Gas ~ InsulBefore + InsulAfter + InsulBefore:Temp + InsulAfter:Temp I.e., that is what your convoluted formula is being interpreted as, you then still need to remove the intercept that R will automagically add to the model when it subsequently creates the model matrix. HTH G
Regards, Mark.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%