GEE and AIC
Le 30.05.2008 02:18, tgarland at ucr.edu a ?crit :
This is not a problem for contrasts of GLS-type methods. It can be a pain in the a** to code a whole bunch of categorical variables as dummy variables and then compute contrasts (depending on your software), but it is not a problem from the perspective of the math/stats.
You're right to add "depending on your software" because it's very easy
with R (saving the user's a**), but few people seem to know it. The
function model.matrix is the tool here. Suppose you have a factor with
four levels and two repetitions for each level:
R> (x <- gl(4, 2))
[1] 1 1 2 2 3 3 4 4
Levels: 1 2 3 4
Then you just specify to model.matrix that x is a predictor in the usual
R formula notation:
R> model.matrix(~ x)
(Intercept) x2 x3 x4
1 1 0 0 0
2 1 0 0 0
3 1 1 0 0
4 1 1 0 0
5 1 0 1 0
6 1 0 1 0
7 1 0 0 1
8 1 0 0 1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"
This is actually used to prepare the "model matrix" in a regression,
hence the column of 1's added for the intercept. If you just want the
dummy variables:
R> model.matrix(~ x)[, -1]
x2 x3 x4
1 0 0 0
2 0 0 0
3 1 0 0
4 1 0 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
If you've got several such variables you can do model.matrix( ~ x+y+z),
possibly adding interactions too. If they are in a data.frame, then use
the option 'data'.
If you don't like the current coding, you can change it in the options
(see ?C):
R> options(contrasts = c("contr.sum", "contr.poly"))
R> model.matrix(~ x)[, -1]
x1 x2 x3
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
7 -1 -1 -1
8 -1 -1 -1
model.matrix() is called internally by most regression functions (lm,
glm, gee, gls, lme, ...) so the user doesn't have to bother about it,
but it's nice to see how it works.
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/