Meaning of /, :, and %in% in lmer
On 4/18/08, Claus Wilke <cwilke at mail.utexas.edu> wrote:
The short answer is that (1|A/B) is expanded to (1|A) + (1|A:B) so you > can choose whatever form makes sense to you.
Thanks, that was what I needed to hear.
> There are different circumstances where a notation like (1|A/B) would > be used. Some are reasonable choices and some are artifacts of > artificial ways of assigning labels to factor levels. Rather than my > trying to guess what kind of application you have in mind, could you > describe a situation where you would want to fit an lmer model with > terms like that?
It's a virology experiment. We have two ancestral strains. From each of those
we have derived several new strains, and then have made multiple fitness
measurements on the new strains. We want to know whether the ancestral strain
has an effect on the fitness of the derived strains. The model I'm using for
that is
fitness ~ ancestor + (1|ancestor:strain),
because strains are nested within ancestors. If I were using
fitness ~ ancestor + (1|ancestor/strain),
then ancestor would get both a fixed and a random effect, which doesn't make
sense.
The labeling question is related to the levels of the strain factor. To me the sensible way to label strains is to give each unique strain a unique label. In fact, I would go so far as to say that is the only sensible way. So suppose the ancestral strains are called "A" and "B" and there were 8 strains derived from "A" and 12 strains derived from "B". The I would give them labels like "A01" up to "A08" and "B01" up to "B12". Many people feel the strains from ancestor A should be labeled 1 up to 8 and those from ancestor B labeled 1 up to 12 and then incorporate the information that strain is nested within ancestor somewhere in the model description. To me this makes no sense. If strain 1 from ancestor A is not related in any way to strain 1 from ancestor B, why call them both "1". If the strains are labeled so that each unique strain has a unique label then the model can be written as fitness ~ ancestor + (1|strain) or as fitness ~ ancestor + (1|ancestor:strain) whichever one makes sense to you. If the levels of strain reflect an implicit nesting (that is, you need to know that strain 1 from ancestor A is not the same as strain 1 from ancestor B, even though they are given the same level of strain) then you must write the model in the second form but only because the labels of strain are ambiguous and the expression ancestor:strain is required to disambiguate the levels.
I have a second question, related to the hypothesis testing of whether the fixed ancestor effect is significant. I've read all the threads about why it is problematic to do an F test to calculate a p value, and that it is better to do markov-chain monte carlo. My question is: Is there a proper reference I can cite to substantiate the claim that the standard (i.e., SAS) way of calculating significance in this case is problematic, or do I have to refer to the mailing list archive?
Harald Baayen's recent book on "Analyzing Linguistic Data" has a good discussion of some of the issues in determining significance of fixed-effects terms in a mixed-effects model. I like some of the explanations in his chapter 7. To tell the truth I expect that the standard approach is reasonably accurate for cases where the only random effects term in the model is of the form (1|strain); it's in the more complex models that the simple approximations get off track. The sort of data that Harald and many others in psychometric areas consider is cross-classified according to subject and item and the standard approaches get bogged down there.
Thanks a lot, Claus -- Claus Wilke Section of Integrative Biology and Center for Computational Biology and Bioinformatics University of Texas at Austin 1 University Station C0930 Austin, TX 78712 cwilke at mail.utexas.edu 512 471 6028