Skip to content

suggested addition to model.matrix

3 messages · Spencer Graves, John Fox, William Dunlap

#
Hello, All:


       What's the simplest way to convert a data.frame into a model.matrix?


       One way is given by the following example, modified from the 
examples in help(model.matrix):


dd <- data.frame(a = gl(3,4), b = gl(4,1,12))
ab <- model.matrix(~ a + b, dd)
ab0 <- model.matrix(~., dd)
all.equal(ab, ab0)


       What do you think about replacing "model.matrix(~ a + b, dd)" in 
the current help(model.matrix) with this 3-line expansion?


       I suggest this, because I spent a few hours today trying to 
convert a data.frame into a model.matrix before finding this.


       Also, what do you think about adding something like the following 
to the stats package:


model.matrix.data.frame <- function(object, ...){
     model.matrix(~., object, ...)
}


       And then extend the above example as follows:

ab. <- model.matrix(dd)
all.equal(ab, ab.)


       Thanks,
       Spencer Graves
#
Dear Spencer,

I don't think that the problem of "converting a data frame into a model matrix" is well-defined, because there isn't a unique mapping from one to the other. 

In your example, you build  the model matrix for the additive formula ~ a + b from the data frame matrix containing a and b, using "treatment" contrasts, but there are other possible formulas (e.g., ~ a*b) and contrasts [e.g., model.matrix(~ a + b, dd, contrasts=list(a=contr.sum, b=contr.helmert)].

So I think that the current approach is sensible -- to require both a data frame and a formula.

Best,
 John
#
In addition, there is a formula method for data.frame that
assumes the first column is the dependent variable.
 > z <- data.frame(X1=1:6,X2=letters[1:3],Y=log(1:6))
 > formula(z)
 X1 ~ X2 + Y
 > colnames(model.matrix(formula(z), z))
 [1] "(Intercept)" "X2b"         "X2c"         "Y"

Spencer's request is that the default formula given to model.matrix have
no dependent variable.
 > colnames(model.matrix(~., z))
 [1] "(Intercept)" "X1"          "X2b"         "X2c"         "Y"

In my opinion, formula.data.frame is a mistake, but we don't need two
incompatible mistakes.


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Mon, Oct 3, 2016 at 9:46 PM, Fox, John <jfox at mcmaster.ca> wrote: