Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
mixed model with non-continuous numeric response
8 messages · Daniel Ezra Johnson, Jonathan Baron, Reinhold Kliegl +1 more
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
See Venables and Ripley (2002, p.200) for an example modeling three-levels of satisfaction (low, medium, high) as a surrogate Poisson model. They also provide the technical justification. The alternative is to fit it as multinomial model--not sure how, if it at all, this can be done with glmer in its current implementation. Reinhold Kliegl On Mon, Dec 22, 2008 at 1:41 PM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 12/22/08 15:04, Reinhold Kliegl wrote:
See Venables and Ripley (2002, p.200) for an example modeling three-levels of satisfaction (low, medium, high) as a surrogate Poisson model. They also provide the technical justification. The alternative is to fit it as multinomial model--not sure how, if it at all, this can be done with glmer in its current implementation.
Johnson (the original poster) said that the responses can be thought of as equally spaced points, i.e., linear with the underlying variable of interest. I think that this is often a reasonable assumption, so another alternative is to do what he said. Psychologists -- perhaps because we have read Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 97?106 -- are often willing to assume that linear models are good fits even when they are technically wrong. (I also couldn't find VR's rationale for the surrogate Poisson model, but I'm not questioning that possibility.) The question is about how serious is the violation of the assumed error distribution when we have only 4 categories. When I do this - which I admit is usually when I'm using lm() and not lmer() - I look at the error distributions (from the default plot()) and do an eyeball test. If the result is barely "significant" at the outset, I worry. Jon
Reinhold Kliegl On Mon, Dec 22, 2008 at 1:41 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
The VR paragraphs I was referring to are on page 199. Anyway, if one is willing to make the assumption of linear spacing, then responses 1, 2, 3, 4 can surely also be interpreted as count data; sort of the number of latent pieces of evidence you need to move up one response category; subtract 1 if you want "0" as part of the scale. Then, indeed, the distribution or responses matters. If the distribution looks roughly "normal" (e.g., if categories 2 and 3 are more frequent than 1 and 4), it probably does not matter whether you use the Gaussian or the Poisson family. If they are bi-modal, I would definitely prefer the latter. (Of course, it does matter if you have a substantive theory.) Reinhold Kliegl
On Mon, Dec 22, 2008 at 4:06 PM, Jonathan Baron <baron at psych.upenn.edu> wrote:
On 12/22/08 15:04, Reinhold Kliegl wrote:
See Venables and Ripley (2002, p.200) for an example modeling three-levels of satisfaction (low, medium, high) as a surrogate Poisson model. They also provide the technical justification. The alternative is to fit it as multinomial model--not sure how, if it at all, this can be done with glmer in its current implementation.
Johnson (the original poster) said that the responses can be thought of as equally spaced points, i.e., linear with the underlying variable of interest. I think that this is often a reasonable assumption, so another alternative is to do what he said. Psychologists -- perhaps because we have read Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 97?106 -- are often willing to assume that linear models are good fits even when they are technically wrong. (I also couldn't find VR's rationale for the surrogate Poisson model, but I'm not questioning that possibility.) The question is about how serious is the violation of the assumed error distribution when we have only 4 categories. When I do this - which I admit is usually when I'm using lm() and not lmer() - I look at the error distributions (from the default plot()) and do an eyeball test. If the result is barely "significant" at the outset, I worry. Jon
Reinhold Kliegl On Mon, Dec 22, 2008 at 1:41 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
Line 1 in the paragraph 2 below should read: "Then, indeed, the distribution of errors matters. ..." Reinhold Kliegl On Mon, Dec 22, 2008 at 5:25 PM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
The VR paragraphs I was referring to are on page 199. Anyway, if one is willing to make the assumption of linear spacing, then responses 1, 2, 3, 4 can surely also be interpreted as count data; sort of the number of latent pieces of evidence you need to move up one response category; subtract 1 if you want "0" as part of the scale. Then, indeed, the distribution or responses matters. If the distribution looks roughly "normal" (e.g., if categories 2 and 3 are more frequent than 1 and 4), it probably does not matter whether you use the Gaussian or the Poisson family. If they are bi-modal, I would definitely prefer the latter. (Of course, it does matter if you have a substantive theory.) Reinhold Kliegl On Mon, Dec 22, 2008 at 4:06 PM, Jonathan Baron <baron at psych.upenn.edu> wrote:
On 12/22/08 15:04, Reinhold Kliegl wrote:
See Venables and Ripley (2002, p.200) for an example modeling three-levels of satisfaction (low, medium, high) as a surrogate Poisson model. They also provide the technical justification. The alternative is to fit it as multinomial model--not sure how, if it at all, this can be done with glmer in its current implementation.
Johnson (the original poster) said that the responses can be thought of as equally spaced points, i.e., linear with the underlying variable of interest. I think that this is often a reasonable assumption, so another alternative is to do what he said. Psychologists -- perhaps because we have read Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 97?106 -- are often willing to assume that linear models are good fits even when they are technically wrong. (I also couldn't find VR's rationale for the surrogate Poisson model, but I'm not questioning that possibility.) The question is about how serious is the violation of the assumed error distribution when we have only 4 categories. When I do this - which I admit is usually when I'm using lm() and not lmer() - I look at the error distributions (from the default plot()) and do an eyeball test. If the result is barely "significant" at the outset, I worry. Jon
Reinhold Kliegl On Mon, Dec 22, 2008 at 1:41 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
Daniel, I think one of the ordinal response models discussed in sections 2-4 of chapter 7 of Agresti's "Categorical Data Analysis" (second edition) would be appropriate here. Section 12.4 briefly discusses adding random effects to such models. It's not clear to me how to accomplish this using the lme4 package (but of course "This is R. There is no if. ..."). A fallback strategy would be to run various mixed-effect binary response models (using different cut points on the ordinal scale) separately, and see how consistent the results are. You may find the "lrm" function in Frank Harrell's Design package useful for some of the alternatives to the cumulative logit model (though it doesn't handle random effects). Regards, Rob Kushler
Daniel Ezra Johnson wrote:
I don't think this is count data, is it??? On Mon, Dec 22, 2008 at 12:40 PM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
( ..., family="poisson") is the most used option for count data Reinhold Kliegl On Mon, Dec 22, 2008 at 12:54 PM, Daniel Ezra Johnson <danielezrajohnson at gmail.com> wrote:
Dear all, I have survey results where the response is 1, 2, 3, or 4. These can be thought of as equally-spaced points on a scale, I don't have a problem with that. (They're actually more like "not at all", "some", "mostly", "totally"; the subject is judging a stimulus.) I want to model crossed random effects for Subject and Item. Am I way off base in modeling this data with a lmer(family="gaussian") model? I know it's not perfect, but is it really bad? If so, what could I do instead? (The error certainly wouldn't be binomial, right?) Thanks, Daniel
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models