Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC, where C = consonant and V = vowel. The rate at which the acoustics changed at the beginning of the syllable was varied orthogonally with the duration of the vowel. The rate of acoustic change conveyed the identity of the initial consonant, which was expected to sound like "b" when the rate of change was faster and like "w" when it was slower. The duration of the vowel conveyed how many syllables the string consisted of, which was expected to be "1" when the vowel was shorter and "2" when the vowel was longer. The listeners were instructed to respond with "b" or "w" and "1" or "2" on every trial. So, unlike a truly multinomial dependent variable, such as professions or majors, the responses here are not unordered. They also cannot be arranged into a single order sensibly, because even if "b1" and "w2" responses are first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/>
how to specify the response (dependent) variable in a logistic regression model
7 messages · Greg Snow, Phillip Alday, John Kingston +3 more
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well as their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish.
On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu> wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC, where C = consonant and V = vowel. The rate at which the acoustics changed at the beginning of the syllable was varied orthogonally with the duration of the vowel. The rate of acoustic change conveyed the identity of the initial consonant, which was expected to sound like "b" when the rate of change was faster and like "w" when it was slower. The duration of the vowel conveyed how many syllables the string consisted of, which was expected to be "1" when the vowel was shorter and "2" when the vowel was longer. The listeners were instructed to respond with "b" or "w" and "1" or "2" on every trial. So, unlike a truly multinomial dependent variable, such as professions or majors, the responses here are not unordered. They also cannot be arranged into a single order sensibly, because even if "b1" and "w2" responses are first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
John, How comfortable are you with mixed models software beyond lme4? This seems like a perfect case for a multivariate mixed model (which you can do with e.g. brms or MCMCglmm). The basic idea is that you do create a single mixed model that can be thought of doing two GLMMs simultaneously. Here's the basic syntax for doing this in brms: brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data, family=binomial) You can also specify this as two formulae (which really highlights the "two models simultaneously" intuition): var1 = bf(Resp1 ~ preds + ....) + binomial() var2 = bf(Resp2 ~ preds + ....) + binomial() brm(var1 + var2, data=your_data) The advantage to doing this as a multivariate model as opposed to separate models is that you get simultaneous estimates across both models, including correlation/covariance between those estimates. See e.g. the brms documentation (https://paul-buerkner.github.io/brms/articles/brms_multivariate.html) for more info. In particular, pay attention to the extra syntax for computing shared correlation in the random effects across sub-models. The cons for this approach are that [1] most reviewers in (psycho)linguistics will not be familiar with it (and there was recent a Twitter storm on this very problem) and [2] the computational costs are noticeably higher. Another alternative is to do something like "linked mixed models" (cf. Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants on this, but the basic idea is that you use one response to predict the other. Given the temporal ordering here, this might make sense, e.g. mod1 = glmer(Resp1 ~ preds + ....) mod2 = glmer(Resp2 ~ preds + YYY + ....) where YYY is one of: [a] Resp1 [b] fitted(mod1) [c] fitted(mod1) + resid(mod1) You can potentially omit mod1, in which case you have something like the Davidson and Martin (Acta Psychologia, 2016) approach to the joint analysis of reaction times and response accuracy. The downside to this approach is that the variability that's in Resp1 can create problems in mod2, because standard GLMMs assume that the predictors are measured without error/variability. Variants [b] and especially [c] mitigate this a bit though. (And if you want to get even more complicated, there are "errors-within-variables" models, which can handle this and are available in e.g. brms). I think the advantage to the linked model approach relative to the multivariate approach is that it's somewhat more accessible for a typical (psycho)linguistic reviewer. Note that I am nominally originally from linguistics and do know a bit about mixed models, so I'm a good usual suspect for a reviewer on these things. Best, Phillip PS: the multinomial models suggested by the others are also pretty good, but again multinomial models are usually something that require getting used to and doesn't reflect the potential covariance of Resp1 and Resp2 in an obvious way.
On 14/1/21 5:05 pm, Greg Snow wrote:
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well as their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish. On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu> wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC, where C = consonant and V = vowel. The rate at which the acoustics changed at the beginning of the syllable was varied orthogonally with the duration of the vowel. The rate of acoustic change conveyed the identity of the initial consonant, which was expected to sound like "b" when the rate of change was faster and like "w" when it was slower. The duration of the vowel conveyed how many syllables the string consisted of, which was expected to be "1" when the vowel was shorter and "2" when the vowel was longer. The listeners were instructed to respond with "b" or "w" and "1" or "2" on every trial. So, unlike a truly multinomial dependent variable, such as professions or majors, the responses here are not unordered. They also cannot be arranged into a single order sensibly, because even if "b1" and "w2" responses are first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Phillip and Greg, Thank you both very much. I don't have experience yet beyond lme4, but you've both given me useful directions to pursue. I'll come back with results once they're in hand. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/>
On Thu, Jan 14, 2021 at 11:41 AM Phillip Alday <me at phillipalday.com> wrote:
John, How comfortable are you with mixed models software beyond lme4? This seems like a perfect case for a multivariate mixed model (which you can do with e.g. brms or MCMCglmm). The basic idea is that you do create a single mixed model that can be thought of doing two GLMMs simultaneously. Here's the basic syntax for doing this in brms: brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data, family=binomial) You can also specify this as two formulae (which really highlights the "two models simultaneously" intuition): var1 = bf(Resp1 ~ preds + ....) + binomial() var2 = bf(Resp2 ~ preds + ....) + binomial() brm(var1 + var2, data=your_data) The advantage to doing this as a multivariate model as opposed to separate models is that you get simultaneous estimates across both models, including correlation/covariance between those estimates. See e.g. the brms documentation (https://paul-buerkner.github.io/brms/articles/brms_multivariate.html) for more info. In particular, pay attention to the extra syntax for computing shared correlation in the random effects across sub-models. The cons for this approach are that [1] most reviewers in (psycho)linguistics will not be familiar with it (and there was recent a Twitter storm on this very problem) and [2] the computational costs are noticeably higher. Another alternative is to do something like "linked mixed models" (cf. Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants on this, but the basic idea is that you use one response to predict the other. Given the temporal ordering here, this might make sense, e.g. mod1 = glmer(Resp1 ~ preds + ....) mod2 = glmer(Resp2 ~ preds + YYY + ....) where YYY is one of: [a] Resp1 [b] fitted(mod1) [c] fitted(mod1) + resid(mod1) You can potentially omit mod1, in which case you have something like the Davidson and Martin (Acta Psychologia, 2016) approach to the joint analysis of reaction times and response accuracy. The downside to this approach is that the variability that's in Resp1 can create problems in mod2, because standard GLMMs assume that the predictors are measured without error/variability. Variants [b] and especially [c] mitigate this a bit though. (And if you want to get even more complicated, there are "errors-within-variables" models, which can handle this and are available in e.g. brms). I think the advantage to the linked model approach relative to the multivariate approach is that it's somewhat more accessible for a typical (psycho)linguistic reviewer. Note that I am nominally originally from linguistics and do know a bit about mixed models, so I'm a good usual suspect for a reviewer on these things. Best, Phillip PS: the multinomial models suggested by the others are also pretty good, but again multinomial models are usually something that require getting used to and doesn't reflect the potential covariance of Resp1 and Resp2 in an obvious way. On 14/1/21 5:05 pm, Greg Snow wrote:
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well as their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish. On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu>
wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC, where
C =
consonant and V = vowel. The rate at which the acoustics changed at the beginning of the syllable was varied orthogonally with the duration of
the
vowel. The rate of acoustic change conveyed the identity of the initial consonant, which was expected to sound like "b" when the rate of change
was
faster and like "w" when it was slower. The duration of the vowel
conveyed
how many syllables the string consisted of, which was expected to be "1" when the vowel was shorter and "2" when the vowel was longer. The
listeners
were instructed to respond with "b" or "w" and "1" or "2" on every
trial.
So, unlike a truly multinomial dependent variable, such as professions
or
majors, the responses here are not unordered. They also cannot be
arranged
into a single order sensibly, because even if "b1" and "w2" responses
are
first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Another fellow linguist here, albeit a junior one. It sounds to me like: 1. You have two binary outcomes, the 1st always occurring before the 2nd. 2. For each subject, you have exactly one response per binary outcome. This being the case, I struggle to see why a complex model such as a multinomial, multi-response or GLMM is even necessary. If each subject makes the same number of responses, their idiosyncrasies can be expected to cancel each other out overall, no?. Therefore, we don't need random effects. We can fit a standard logistic fixed-effects model whose interpretation is 'A random individual makes a binary choice'. And since the binary responses have a fixed order, the first one can simply be used as a covariate in the analysis of the second one. Thus, I struggle to see why we could not simply: 1. First fit a standard logistic regression for Response 1 with all covariates of interest. 2. Then fit a standard logistic regression for Response 2 with all covariates of interest PLUS the subject's observed Response 1 as an added covariate. Its addition will address the question of possible correlation between the two responses. Experts, please do tell me why I'm wrong. Best, Juho pe 15. tammik. 2021 klo 1.27 John Kingston (jkingstn at umass.edu) kirjoitti:
Dear Phillip and Greg, Thank you both very much. I don't have experience yet beyond lme4, but you've both given me useful directions to pursue. I'll come back with results once they're in hand. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> On Thu, Jan 14, 2021 at 11:41 AM Phillip Alday <me at phillipalday.com> wrote:
John, How comfortable are you with mixed models software beyond lme4? This seems like a perfect case for a multivariate mixed model (which you can do with e.g. brms or MCMCglmm). The basic idea is that you do create a single mixed model that can be thought of doing two GLMMs simultaneously. Here's the basic syntax for doing this in brms: brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data, family=binomial) You can also specify this as two formulae (which really highlights the "two models simultaneously" intuition): var1 = bf(Resp1 ~ preds + ....) + binomial() var2 = bf(Resp2 ~ preds + ....) + binomial() brm(var1 + var2, data=your_data) The advantage to doing this as a multivariate model as opposed to separate models is that you get simultaneous estimates across both models, including correlation/covariance between those estimates. See e.g. the brms documentation (https://paul-buerkner.github.io/brms/articles/brms_multivariate.html) for more info. In particular, pay attention to the extra syntax for computing shared correlation in the random effects across sub-models. The cons for this approach are that [1] most reviewers in (psycho)linguistics will not be familiar with it (and there was recent a Twitter storm on this very problem) and [2] the computational costs are noticeably higher. Another alternative is to do something like "linked mixed models" (cf. Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants on this, but the basic idea is that you use one response to predict the other. Given the temporal ordering here, this might make sense, e.g. mod1 = glmer(Resp1 ~ preds + ....) mod2 = glmer(Resp2 ~ preds + YYY + ....) where YYY is one of: [a] Resp1 [b] fitted(mod1) [c] fitted(mod1) + resid(mod1) You can potentially omit mod1, in which case you have something like the Davidson and Martin (Acta Psychologia, 2016) approach to the joint analysis of reaction times and response accuracy. The downside to this approach is that the variability that's in Resp1 can create problems in mod2, because standard GLMMs assume that the predictors are measured without error/variability. Variants [b] and especially [c] mitigate this a bit though. (And if you want to get even more complicated, there are "errors-within-variables" models, which can handle this and are available in e.g. brms). I think the advantage to the linked model approach relative to the multivariate approach is that it's somewhat more accessible for a typical (psycho)linguistic reviewer. Note that I am nominally originally from linguistics and do know a bit about mixed models, so I'm a good usual suspect for a reviewer on these things. Best, Phillip PS: the multinomial models suggested by the others are also pretty good, but again multinomial models are usually something that require getting used to and doesn't reflect the potential covariance of Resp1 and Resp2 in an obvious way. On 14/1/21 5:05 pm, Greg Snow wrote:
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well as their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish. On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu>
wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC,
where
C =
consonant and V = vowel. The rate at which the acoustics changed at
the
beginning of the syllable was varied orthogonally with the duration of
the
vowel. The rate of acoustic change conveyed the identity of the
initial
consonant, which was expected to sound like "b" when the rate of
change
was
faster and like "w" when it was slower. The duration of the vowel
conveyed
how many syllables the string consisted of, which was expected to be
"1"
when the vowel was shorter and "2" when the vowel was longer. The
listeners
were instructed to respond with "b" or "w" and "1" or "2" on every
trial.
So, unlike a truly multinomial dependent variable, such as professions
or
majors, the responses here are not unordered. They also cannot be
arranged
into a single order sensibly, because even if "b1" and "w2" responses
are
first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear all, IMHO, Phillip's suggestion matches best with the data generating model. Greg's suggestions are simpler but usable models too. Given that different subjects respond to the same questions and we can rule out that one subject is more likely to respond "b" over "w" than another subject, we need to add the subject as a random effect. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op vr 15 jan. 2021 om 10:09 schreef Juho Kristian Ruohonen < juho.kristian.ruohonen at gmail.com>:
Another fellow linguist here, albeit a junior one. It sounds to me like: 1. You have two binary outcomes, the 1st always occurring before the 2nd. 2. For each subject, you have exactly one response per binary outcome. This being the case, I struggle to see why a complex model such as a multinomial, multi-response or GLMM is even necessary. If each subject makes the same number of responses, their idiosyncrasies can be expected to cancel each other out overall, no?. Therefore, we don't need random effects. We can fit a standard logistic fixed-effects model whose interpretation is 'A random individual makes a binary choice'. And since the binary responses have a fixed order, the first one can simply be used as a covariate in the analysis of the second one. Thus, I struggle to see why we could not simply: 1. First fit a standard logistic regression for Response 1 with all covariates of interest. 2. Then fit a standard logistic regression for Response 2 with all covariates of interest PLUS the subject's observed Response 1 as an added covariate. Its addition will address the question of possible correlation between the two responses. Experts, please do tell me why I'm wrong. Best, Juho pe 15. tammik. 2021 klo 1.27 John Kingston (jkingstn at umass.edu) kirjoitti:
Dear Phillip and Greg, Thank you both very much. I don't have experience yet beyond lme4, but you've both given me useful directions to pursue. I'll come back with results once they're in hand. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> On Thu, Jan 14, 2021 at 11:41 AM Phillip Alday <me at phillipalday.com> wrote:
John, How comfortable are you with mixed models software beyond lme4? This seems like a perfect case for a multivariate mixed model (which you can do with e.g. brms or MCMCglmm). The basic idea is that you do create a single mixed model that can be thought of doing two GLMMs simultaneously. Here's the basic syntax for doing this in brms: brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data,
family=binomial)
You can also specify this as two formulae (which really highlights the "two models simultaneously" intuition): var1 = bf(Resp1 ~ preds + ....) + binomial() var2 = bf(Resp2 ~ preds + ....) + binomial() brm(var1 + var2, data=your_data) The advantage to doing this as a multivariate model as opposed to separate models is that you get simultaneous estimates across both models, including correlation/covariance between those estimates. See e.g. the brms documentation (https://paul-buerkner.github.io/brms/articles/brms_multivariate.html) for more info. In particular, pay attention to the extra syntax for computing shared correlation in the random effects across sub-models. The cons for this approach are that [1] most reviewers in (psycho)linguistics will not be familiar with it (and there was recent
a
Twitter storm on this very problem) and [2] the computational costs are noticeably higher. Another alternative is to do something like "linked mixed models" (cf. Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants
on
this, but the basic idea is that you use one response to predict the other. Given the temporal ordering here, this might make sense, e.g. mod1 = glmer(Resp1 ~ preds + ....) mod2 = glmer(Resp2 ~ preds + YYY + ....) where YYY is one of: [a] Resp1 [b] fitted(mod1) [c] fitted(mod1) + resid(mod1) You can potentially omit mod1, in which case you have something like
the
Davidson and Martin (Acta Psychologia, 2016) approach to the joint analysis of reaction times and response accuracy. The downside to this approach is that the variability that's in Resp1 can create problems in mod2, because standard GLMMs assume that the predictors are measured without error/variability. Variants [b] and especially [c] mitigate this a bit though. (And if you want to get even more complicated, there are "errors-within-variables" models, which
can
handle this and are available in e.g. brms). I think the advantage to the linked model approach relative to the multivariate approach is that it's somewhat more accessible for a typical (psycho)linguistic
reviewer.
Note that I am nominally originally from linguistics and do know a bit about mixed models, so I'm a good usual suspect for a reviewer on these things. Best, Phillip PS: the multinomial models suggested by the others are also pretty
good,
but again multinomial models are usually something that require getting used to and doesn't reflect the potential covariance of Resp1 and Resp2 in an obvious way. On 14/1/21 5:05 pm, Greg Snow wrote:
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well
as
their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish. On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu>
wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the
responses
aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC,
where
C =
consonant and V = vowel. The rate at which the acoustics changed at
the
beginning of the syllable was varied orthogonally with the duration
of
the
vowel. The rate of acoustic change conveyed the identity of the
initial
consonant, which was expected to sound like "b" when the rate of
change
was
faster and like "w" when it was slower. The duration of the vowel
conveyed
how many syllables the string consisted of, which was expected to be
"1"
when the vowel was shorter and "2" when the vowel was longer. The
listeners
were instructed to respond with "b" or "w" and "1" or "2" on every
trial.
So, unlike a truly multinomial dependent variable, such as
professions
or
majors, the responses here are not unordered. They also cannot be
arranged
into a single order sensibly, because even if "b1" and "w2"
responses
are
first and last in the order, there's no way of deciding *a priori*
the
order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I think you can also do this in lme4 with a little bit more work, see https://rpubs.com/bbolker/3336 https://mac-theobio.github.io/QMEE/lectures/MultivariateMixed.notes.html
On 1/14/21 6:27 PM, John Kingston wrote:
Dear Phillip and Greg, Thank you both very much. I don't have experience yet beyond lme4, but you've both given me useful directions to pursue. I'll come back with results once they're in hand. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> On Thu, Jan 14, 2021 at 11:41 AM Phillip Alday <me at phillipalday.com> wrote:
John, How comfortable are you with mixed models software beyond lme4? This seems like a perfect case for a multivariate mixed model (which you can do with e.g. brms or MCMCglmm). The basic idea is that you do create a single mixed model that can be thought of doing two GLMMs simultaneously. Here's the basic syntax for doing this in brms: brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data, family=binomial) You can also specify this as two formulae (which really highlights the "two models simultaneously" intuition): var1 = bf(Resp1 ~ preds + ....) + binomial() var2 = bf(Resp2 ~ preds + ....) + binomial() brm(var1 + var2, data=your_data) The advantage to doing this as a multivariate model as opposed to separate models is that you get simultaneous estimates across both models, including correlation/covariance between those estimates. See e.g. the brms documentation (https://paul-buerkner.github.io/brms/articles/brms_multivariate.html) for more info. In particular, pay attention to the extra syntax for computing shared correlation in the random effects across sub-models. The cons for this approach are that [1] most reviewers in (psycho)linguistics will not be familiar with it (and there was recent a Twitter storm on this very problem) and [2] the computational costs are noticeably higher. Another alternative is to do something like "linked mixed models" (cf. Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants on this, but the basic idea is that you use one response to predict the other. Given the temporal ordering here, this might make sense, e.g. mod1 = glmer(Resp1 ~ preds + ....) mod2 = glmer(Resp2 ~ preds + YYY + ....) where YYY is one of: [a] Resp1 [b] fitted(mod1) [c] fitted(mod1) + resid(mod1) You can potentially omit mod1, in which case you have something like the Davidson and Martin (Acta Psychologia, 2016) approach to the joint analysis of reaction times and response accuracy. The downside to this approach is that the variability that's in Resp1 can create problems in mod2, because standard GLMMs assume that the predictors are measured without error/variability. Variants [b] and especially [c] mitigate this a bit though. (And if you want to get even more complicated, there are "errors-within-variables" models, which can handle this and are available in e.g. brms). I think the advantage to the linked model approach relative to the multivariate approach is that it's somewhat more accessible for a typical (psycho)linguistic reviewer. Note that I am nominally originally from linguistics and do know a bit about mixed models, so I'm a good usual suspect for a reviewer on these things. Best, Phillip PS: the multinomial models suggested by the others are also pretty good, but again multinomial models are usually something that require getting used to and doesn't reflect the potential covariance of Resp1 and Resp2 in an obvious way. On 14/1/21 5:05 pm, Greg Snow wrote:
John, I agree that ordering your responses does not make sense, but the multinomial models are for unordered categorical data. So you can just treat your 4 possible outcomes as unordered categories. Another option is to convert to a Poisson regression where the response variable is the count (number of times each of the 4 combinations is selected) and then your categories become explanitory/predictor variables. You can either use a single predictor with the 4 levels (and choose appropriate indicator variables) or you can have 2 predictors (b vs w and 1 vs 2) as well as their interaction. That would give a different interpretation of the model, but may be more what you are trying to accomplish. On Thu, Jan 14, 2021 at 8:44 AM John Kingston <jkingstn at umass.edu>
wrote:
Dear Thierry, Thanks for your question. Here's the reason why I think the responses aren't multinomial (or ordinal). The listeners were presented with spoken strings of the form CVC, where
C =
consonant and V = vowel. The rate at which the acoustics changed at the beginning of the syllable was varied orthogonally with the duration of
the
vowel. The rate of acoustic change conveyed the identity of the initial consonant, which was expected to sound like "b" when the rate of change
was
faster and like "w" when it was slower. The duration of the vowel
conveyed
how many syllables the string consisted of, which was expected to be "1" when the vowel was shorter and "2" when the vowel was longer. The
listeners
were instructed to respond with "b" or "w" and "1" or "2" on every
trial.
So, unlike a truly multinomial dependent variable, such as professions
or
majors, the responses here are not unordered. They also cannot be
arranged
into a single order sensibly, because even if "b1" and "w2" responses
are
first and last in the order, there's no way of deciding *a priori* the order of "b2" and "w1" responses. Again, thanks for your reply. Best, John John Kingston Professor Linguistics Department University of Massachusetts N434 Integrative Learning Center 650 N. Pleasant Street Amherst, MA 01003 1-413-545-6833, fax -2792 jkingstn at umass.edu https://blogs.umass.edu/jkingstn <https://blogs.umass.edu/jkingstn/wp-admin/> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models