Skip to content

mixed model with non-continuous numeric response

8 messages · Daniel Ezra Johnson, Jonathan Baron, Reinhold Kliegl +1 more

#
Dear all,

I have survey results where the response is 1, 2, 3, or 4. These can
be thought of as equally-spaced points on a scale, I don't have a
problem with that. (They're actually more like "not at all", "some",
"mostly", "totally"; the subject is judging a stimulus.)

I want to model crossed random effects for Subject and Item. Am I way
off base in modeling this data with a lmer(family="gaussian") model? I
know it's not perfect, but is it really bad? If so, what could I do
instead? (The error certainly wouldn't be binomial, right?)

Thanks,
Daniel
#
( ...,  family="poisson")  is the most used option for count data

Reinhold Kliegl

On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
#
I don't think this is count data, is it???

On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
#
See Venables and Ripley (2002, p.200) for an example modeling
three-levels of satisfaction (low, medium, high) as a surrogate
Poisson model. They also provide the technical justification. The
alternative is to fit it as multinomial model--not sure how, if it at
all, this can be done with glmer in its current implementation.

Reinhold Kliegl

On Mon, Dec 22, 2008 at 1:41 PM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
#
On 12/22/08 15:04, Reinhold Kliegl wrote:
Johnson (the original poster) said that the responses can be thought
of as equally spaced points, i.e., linear with the underlying variable
of interest.  I think that this is often a reasonable assumption, so
another alternative is to do what he said.  Psychologists -- perhaps
because we have read Dawes, R. M., & Corrigan, B. (1974). Linear
models in decision making. Psychological Bulletin, 81, 97?106 -- are
often willing to assume that linear models are good fits even when
they are technically wrong.

(I also couldn't find VR's rationale for the surrogate Poisson model,
but I'm not questioning that possibility.)

The question is about how serious is the violation of the assumed
error distribution when we have only 4 categories.  When I do this -
which I admit is usually when I'm using lm() and not lmer() - I look
at the error distributions (from the default plot()) and do an eyeball
test.  If the result is barely "significant" at the outset, I worry.

Jon

  
    
#
The VR paragraphs I was referring to are on page 199.   Anyway, if one
is willing to make the assumption of linear spacing, then responses 1,
2, 3, 4 can surely also be interpreted as count data; sort of the
number of latent pieces of evidence you need to move up one  response
category; subtract 1 if you want "0" as part of the scale.

Then, indeed, the distribution or responses matters. If the
distribution looks roughly "normal" (e.g., if categories 2 and 3 are
more frequent than 1 and 4), it probably does not matter whether you
use the Gaussian or the Poisson family. If they are bi-modal, I would
definitely prefer the latter. (Of course, it does matter if you have a
substantive theory.)

Reinhold Kliegl
On Mon, Dec 22, 2008 at 4:06 PM, Jonathan Baron <baron at psych.upenn.edu> wrote:
#
Line 1 in the paragraph 2 below should read: "Then, indeed, the
distribution of errors matters. ..."
Reinhold Kliegl

On Mon, Dec 22, 2008 at 5:25 PM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
#
Daniel,

I think one of the ordinal response models discussed in sections 2-4
of chapter 7 of Agresti's "Categorical Data Analysis" (second edition)
would be appropriate here.  Section 12.4 briefly discusses adding random
effects to such models.  It's not clear to me how to accomplish this
using the lme4 package (but of course "This is R.  There is no if. ...").

A fallback strategy would be to run various mixed-effect binary response
models (using different cut points on the ordinal scale) separately, and see
how consistent the results are.

You may find the "lrm" function in Frank Harrell's Design package useful
for some of the alternatives to the cumulative logit model (though it doesn't
handle random effects).

Regards,   Rob Kushler
Daniel Ezra Johnson wrote: