Skip to content

lme4 and Variable level detection

1 message · Douglas Bates

#
On Sat, Feb 28, 2009 at 9:00 AM, Jeroen Ooms <j.c.l.ooms at uu.nl> wrote:

            
Questions such as this may be answered more quickly if you send them
to the R-SIG-Mixed-Models mailing list, which I am cc:ing on this
reply.
In some ways, exposure to software like HLM or MLWin can be more of a
hindrance than a help when learning about mixed models.  In
presentation of the model and in the software itself these packages
emphasize "levels" of random effects leading to the impression that we
can only associate random effects with factors that are nested.  This
is a misconception.  There are many cases where is it eminently
sensible to associate random effects with factors that are completely
crossed ('subject' and 'item' are a prime example) or partially
crossed.  The archetypal example used in multilevel modeling,
achievement scores on students nested in classes nested in schools
nested in ..., becomes partially crossed when we track students over
time and they move from class to class or school to school.

I imagine that the reason for defining the model in terms of nested
factors for random effects is computational.  If you insist that the
random effects must always be defined with respect to nested factors
then you can employ methods that take advantage of this, with
considerable simplification in the storage and computational burden.
The lme4 package adopts a different approach based on sparse matrix
storage and decomposition methods.  It turns out that these methods
are competitive with the best methods for models based on nested
factors, in the cases to which they apply, and these methods allow for
fitting much more general models.

An unfortunate side-effect of the emphasis on levels in MLWin and HLM
is the perception that other covariates must be characterized by the
level at which they vary, even if these covariates only determine
fixed-effects parameters.  This is quite untrue and misleading.  The
only constraints on the covariates and the model matrix for the
fixed-effects parameters is that the model matrix must be of full
column rank.  In models that define random effects for slopes, or in
general for the coefficients associated with a covariate, the
constraint is that the covariate cannot be constant within each level
of the grouping factor of the random effect.  For example, we cannot
estimate a random effect for the coefficients for sex (M/F) within
subject (assuming we do not have transgender people in the study).

My advice would be to avoid phrasing the model in terms of levels of
random effects.  Although I realize that those with a background of
using MLWin or HLM may find this more comfortable, I think it would be
propagating bad practices and misconceptions.