ordered multinomial mixed model
On 27 May 2011 17:03, Thomas Mang <thomasmang.ng at googlemail.com> wrote:
Hi, Consider the need for a mixed-effects model with an ordered multinomial response variable. To the best of my knowledge, lme4 would not provide such a thing.
You are right; the lme4 package does not provide functions to fit ordinal mixed models - at least not without modifications. However, there are other CRAN (e.g., ordinal and MCMCglmm) and R-Forge (e.g., ordinal2) packages that does so.
However, I was wondering if such a model can be built using a generalized mixed model with Bernoulli response variables. Here is my thinking, with the question if this is possible as outlined below:
This is a good idea - so good in fact that others have had it before. Recently Ken Knoblauch posted the polmer function to this list based on this idea and using glmer (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/005069.html). I am not sure if polmer is exactly the translation of your description into an R function, but it seems to me that it is fairly close. The only source in the literature to propose this approach that I am aware of is Winship, C. and R. D. Mare (1984). Regression models with ordinal variables. American Sociological Review 49 (4), pp. 512-525. - a very nice paper by the way.
Code the possible response level factors in a binary fashion. For example,
let's assume that there are three levels, denote them "a", "b" and "c" (in
that order), and let the binary representation be:
a = 0
b = 10
c = 11
The idea is that the 0 / 1s represent individual Bernoulli random variables
which describe if one advances one item up in the ordered response levels
list (1) or stops traversing (0). So the first draw tells whether the
response variable is "a" (X = 0) or it is greater than "a" (X = 1);
conditional on the response is greater than "a", a second Bernoulli random
draw tells if the response is "b" (Y = 0), or greater than "b" (where only
"c" is left if Y = 1).
Now translate that into probabilities:
Let the probability of the response random variable, denoted as Z, being "a"
be written as:
P(Z == a) = p_A.
and the complementary probability that Z is greater than "a" is, of course,
P(Z > a) = 1 - p_A.
Similar to the binary coding above, the probability for Z == b or ?Z == c
can be written using a conditional random variable:
P(Z == b) = P(Z == b | Z > a) * P (Z > a) = p_B * (1 - p_A).
and
P(Z == c) = P(Z == c | Z > a) * P(Z > a) = (1 - p_B) * (1 - p_A).
Hence I have just written down the probabilities for each of the three
ordered factor levels using (conditional and unconditional) Bernoulli random
variables.
Of course, each of these probabilities can also be expressed conditional on
covariates, giving rise to a regression model.
I am now wondering if above multinomial model can be realized as mixed model
by appropriately preparing the input table: For a single response measure,
the contribution to the likelihood has above been written down as one ('a')
or product of two ("b", "c") Bernoulli probabilities. Using a Binomial
regression model I should be able to yield these probabilities exactly by
using one input table row if the response value is "a" and by coding a 0 in
the response vector, while I can use two input table rows if the response is
"b" / "c" where in the first row the response value coded as 1 (as in my
binary coding above), and in the second row it is coded as 0 if the response
measure is "b", and as 1 otherwise.
Now to the covariates: Assuming within above framework the effect of
covariates is identical across all factor levels, I would simply duplicate
the covariates row if the response is "b" / "c". Conceptionally this would
be somewhat similar to a proportional odds model, though the maths behind it
is of course deviating. But consider the case where I believe that
covariates do not impose the same effect across all factor levels. Then I
could, globally, introduce an interaction between a binary factor and a
continuous covariate, where in the case of evaluating a row checking for "a"
as response the factor takes on one level, and in a row checking for "b" /
"c" it takes on the other level. That would fit de facto two regression
coefficients for my continuous covariate, depending on which Bernoulli
random variable I am evaluation.
I think of the "different covariate effects for different response levels" as the effect of that covariate being 'nominal' or un-ordered rather than 'ordinal'. This is essentially what Peterson and Harrell (1990) [Partial Proportional Odds Models for Ordinal Response Variables, Applied Statistics, Vol. 39, No. 2. pp. 205-217] denoted "partial proportional odds" for the proportional odds model. The clm (clmm) function in package ordinal fits cumulative link (mixed) models [in you terminology these are ordered multinomial mixed models with a link function of your choice] allowing for some predictors having these nominal effects rather than the usual ordinal effects.
Finally, at the end degrees of freedom must be adjusted, as every response measure should provide only one, independent if it had been set up using one or two input table rows.
This approach (or at least the one detailed by Winship and Mare / polmer) gives impressively accurate ML estimates considering that it is an approximate procedure, but the standard errors are often much too low - the "too many df" is one way to understand this behavior.
My question: Is above thinking correct throughout? Is this maybe already an established and accepted approach to simulate an ordered multinomial regression model (independent of glm, glmm etc?) It should work smoothly for glm-s. Is there any obstacle using above approach when fitting a mixed-effects model? E.g. would a REML fitting somehow blow up my thinking ? If it does, would it screw up things totally, or just provide a not-totally-correct but maybe acceptable approximation ?
Strictly speaking REML is only defined for linear mixed models, but extensions of REML or REML-like procedures to more general (generalized and other non-linear) models have been proposed. I am not sure what you mean by REML fitting in this situation...
In short, can I use above steps to get my needed ordered multinomial mixed-effects model, or do I overlook something ?
I would suggest using one of the existing packages that provide the functionality you are looking for. I am the author of ordinal and ordinal2, so I know what these packages are doing, and I would be happy to discuss implementation of ordinal mixed models and approximations in more detail, but perhaps this is more appropriate of list. Cheers, Rune
many thanks for any input and apologize the long text, best Thomas
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Rune Haubo Bojesen Christensen PhD Student, M.Sc. Eng. Phone: (+45) 45 25 33 63 Mobile: (+45) 30 26 45 54 DTU Informatics, Section for Statistics Technical University of Denmark, Build. 305, Room 122, DK-2800 Kgs. Lyngby, Denmark