Meaning of /, :, and %in% in lmer

Douglas Bates · 2008-04-19T18:58:21Z

On 4/18/08, Claus Wilke wrote: > > The short answer is that (1|A/B) is expanded to (1|A) + (1|A:B) so you > > can choose whatever form makes sense to you. > Thanks, that was what I needed to hear. > > There are different circumstances where a notation like (1|A/B) would > > be used. Some are reasonable choices and some are artifacts of > > artificial ways of assigning labels to factor levels. Rather than my > > trying to guess what kind of application you hav

Douglas Bates

Sat, Apr 19, 2008 11:58 AM

On 4/18/08, Claus Wilke <cwilke at mail.utexas.edu> wrote:

The labeling question is related to the levels of the strain factor.
To me the sensible way to label strains is to give each unique strain
a unique label.  In fact, I would go so far as to say that is the only
sensible way.  So suppose the ancestral strains are called "A" and "B"
and there were 8 strains derived from "A" and 12 strains derived from
"B".  The I would give them labels like "A01" up to "A08" and "B01" up
to "B12".  Many people feel the strains from ancestor A should be
labeled 1 up to 8 and those from ancestor B labeled 1 up to 12 and
then incorporate the information that strain is nested within ancestor
somewhere in the model description.  To me this makes no sense.  If
strain 1 from ancestor A is not related in any way to strain 1 from
ancestor B, why call them both "1".

If the strains are labeled so that each unique strain has a unique
label then the model can be written as
  fitness ~ ancestor + (1|strain)
or as
  fitness ~ ancestor + (1|ancestor:strain)
whichever one makes sense to you.  If the levels of strain reflect an
implicit nesting (that is, you need to know that strain 1 from
ancestor A is not the same as strain 1 from ancestor B, even though
they are given the same level of strain) then you must write the model
in the second form but only because the labels of strain are ambiguous
and the expression ancestor:strain is required to disambiguate the
levels.

Harald Baayen's recent book on "Analyzing Linguistic Data" has a good
discussion of some of the issues in determining significance of
fixed-effects terms in a mixed-effects model.  I like some of the
explanations in his chapter 7.

To tell the truth I expect that the standard approach is reasonably
accurate for cases where the only random effects term in the model is
of the form  (1|strain); it's in the more complex models that the
simple approximations get off track.  The sort of data that Harald and
many others in psychometric areas consider is cross-classified
according to subject and item and the standard approaches get bogged
down there.

Meaning of /, :, and %in% in lmer

Thread (3 messages)