How to use mixed-effects models on multinomial data
On Thu, May 28, 2009 at 9:24 AM, Jonathan Baron <baron at psych.upenn.edu> wrote:
I had already replied to Linda Mortensen, but Emmanuel Charpentier's reply gives me the courage to say to the whole list roughly what I said before, plus a little more.
The assumption that 0-1, 1-2, ... 4-5 are equally spaced measures of the underlying variable of interest may indeed be incorrect, but so may the assumption that the difference between 200-300 msec reaction time is equivalent to the difference between 300-400 msec (etc.). Failure of the assumptions will lead to some additional error, but, as argued by Dawes and Corrigan (Psych. Bull., 1974), not much. ?(And you can look at the residuals as a function of the predictions to see how bad the situation is.) ?In general, in my experience (for what that is worth), you lose far less power by assuming equal spacing than you lose by using a more "conservative" model that treats the dependent measure as ordinal only.
I'm glad to see you write that, Jonathon. I don't have a lot of experience modeling ordinal response data but my impression is that there is more to lose by resorting to comparatively exotic models for an ordinal response than by modeling it with a Gaussian "noise" term. In cases like this where there are six levels, 0 to 5, I think your suggestion of beginning with a linear mixed-effects model and checking the residuals for undesirable behavior is a good start.
Occasionally you may have a theoretical reason for NOT treating the dependent measure as equally spaced (e.g., when doing conjoint analysis), or for treating it as equally spaced (e.g., when testing additive factors in reaction time). In the former sort of case, it might be appropriate to fit a model to each subject using some other method, then look at the coefficients across subjects. ?(This is what I did routinely before lmer.) Jon On 05/28/09 14:35, Emmanuel Charpentier wrote:
Le mercredi 27 mai 2009 ? ?18:08 +0200, Linda Mortensen a ?crit :
Dear list members, In the past, I have used the lmer function to model data sets with crossed random effects (i.e., of subjects and items) and with either a continuous response variable (reaction times) or a binary response variable (correct vs. incorrect response). For the reaction time data, I use the formula: lmer(response ~ predictor1 * predictor2 .... ?+ (1 + predictor1 * predictor2 .... | subject) + (1 + predictor1 * predictor2 .... | item), data)
I think that the second random effect term should be (0 + ...), since there is already an intercept in the first one.
I don't think so. It is quite legitimate to have random effects of the form (1|subject) + (1|item) and the formula above is a generalization of this. A additive random effect for each subject is not confounded with an additive random effect for each item. I would be a more concerned about the number of random effects per subject and per item when you have a complex formula like 1 + predictor1 * predictor2 on the left hand side of the random-effects term. If predictor1 and predictor2 are both numeric predictors this might be justified but I would look at it carefully.
I'm currently working on a data set for which the response variable is
number of correct items with accuracy ranging from 0 to 5. So, here the response variable is not binomial but multinomial.
This approximation may be too rough with only 5 items, though. Furthermore, depending on your beliefs on the cognitive model involved in giving a "correct" response, the distance between 0 and 1 correct response(s) may be close to or very different from the distance between 4 and 5 correct responses, which is exactly what proportional risks model (polr) tries to explain away.
-- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models