how to specify the response (dependent) variable in a logistic regression model

John,

How comfortable are you with mixed models software beyond lme4? This
seems like a perfect case for a multivariate mixed model (which you can
do with e.g. brms or MCMCglmm). The basic idea is that you do create a
single mixed model that can be thought of doing two GLMMs
simultaneously. Here's the basic syntax for doing this in brms:

brm(mvbind(Resp1, Resp2) ~ preds + ..., data=your_data, family=binomial)

You can also specify this as two formulae (which really highlights the
"two models simultaneously" intuition):

var1 = bf(Resp1 ~ preds + ....) + binomial()
var2 = bf(Resp2 ~ preds + ....) + binomial()

brm(var1 + var2, data=your_data)

The advantage to doing this as a multivariate model as opposed to
separate models is that you get simultaneous estimates across both
models, including correlation/covariance between those estimates.  See
e.g. the brms documentation
(https://paul-buerkner.github.io/brms/articles/brms_multivariate.html)
for more info. In particular, pay attention to the extra syntax for
computing shared correlation in the random effects across sub-models.

The cons for this approach are that [1] most reviewers in
(psycho)linguistics will not be familiar with it (and there was recent a
Twitter storm on this very problem) and [2] the computational costs are
noticeably higher.

Another alternative is to do something like "linked mixed models" (cf.
Hohenstein, Matuschek and Kliegl, PBR 2016). There are a few variants on
this, but the basic idea is that you use one response to predict the
other. Given the temporal ordering here, this might make sense, e.g.

mod1 = glmer(Resp1 ~ preds + ....)
mod2 = glmer(Resp2 ~ preds + YYY + ....)

where YYY is one of:
[a] Resp1
[b] fitted(mod1)
[c] fitted(mod1) + resid(mod1)

You can potentially omit mod1, in which case you have something like the
Davidson and Martin (Acta Psychologia, 2016) approach to the joint
analysis of reaction times and response accuracy.

The downside to this approach is that the variability that's in Resp1
can create problems in mod2, because standard GLMMs assume that the
predictors are measured without error/variability. Variants [b] and
especially [c] mitigate this a bit though. (And if you want to get even
more complicated, there are  "errors-within-variables" models, which can
handle this and are available in e.g. brms). I think the advantage to
the linked model approach relative to the multivariate approach is that
it's somewhat more accessible for a typical (psycho)linguistic reviewer.

Note that I am nominally originally from linguistics and do know a bit
about mixed models, so I'm a good usual suspect for a reviewer on these
things.

Best,
Phillip

PS: the multinomial models suggested by the others are also pretty good,
but again multinomial models are usually something that require getting
used to and doesn't reflect the potential covariance of Resp1 and Resp2
in an obvious way.

how to specify the response (dependent) variable in a logistic regression model

Thread (7 messages)