Hi Ben,
This thread is relevant in this regard:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q4/024241.html
At least on my machine, I found a substantial difference in the
parameter estimates. The second form seemed more reliable than the
first, as you'll see from the thread.
Do you get the same result?
Best wishes,
Malcolm
Date: Sat, 2 Jul 2016 13:06:30 -0400
From: Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>>
To: r-sig-mixed-models at r-project.org
<mailto:r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Specifying outcome variable in binomial glmm:
single responses vs cbind?
On 16-07-01 07:37 PM, a y wrote:
> What is the difference between fitting a binomial glmm (without
> effects) in the following two ways?
>
> 1.
> Data formatted in the following way:
>
> (data_long)
> ID correct condition itemID
> 1 1 A i1
> 1 0 A i2
> 1 1 A i3
> 1 1 A i4
> 2 0 B i1
> 2 1 B i2
> 2 1 B i3
> 2 0 B i4
>
> Fitting a model without item random effects:
>
> glmer(correct ~ condition + (1|ID), family = binomial, data =
>
>
> 2.
> Data formatted this way (summing over the correct responses):
>
> (data_short)
> ID sum_correct condition itemID
> 1 3 A NA
> 2 2 B NA
>
> Fitting the following model, assuming there were only 4 items
> dozens of examples like this):
> glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID),
> binomial, data = data_short)
>
> ---
> I figured these models should be identical, but in my experience
> very much not. What am I missing? When is the second (more)
>
> Thanks for any help,
> Andrew
>
I believe they should give different likelihoods but identical
parameter estimates, *differences* among likelihoods (i.e. among
competing models fitted with the same data), etc.. That is,
disaggregating the data leads to an extra additive constant in the
log-likelihood. I would be very interested to see a counter-example to
that statement! In general, the second form should be quicker to fit,
provide residuals that are easier to interpret, etc..