What is the difference between fitting a binomial glmm (without random item effects) in the following two ways? 1. Data formatted in the following way: (data_long) ID correct condition itemID 1 1 A i1 1 0 A i2 1 1 A i3 1 1 A i4 2 0 B i1 2 1 B i2 2 1 B i3 2 0 B i4 Fitting a model without item random effects: glmer(correct ~ condition + (1|ID), family = binomial, data = data_long) 2. Data formatted this way (summing over the correct responses): (data_short) ID sum_correct condition itemID 1 3 A NA 2 2 B NA Fitting the following model, assuming there were only 4 items (I've seen dozens of examples like this): glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family = binomial, data = data_short) --- I figured these models should be identical, but in my experience they are very much not. What am I missing? When is the second (more) appropriate? Thanks for any help, Andrew
Specifying outcome variable in binomial glmm: single responses vs cbind?
3 messages · Ben Bolker, a y
On 16-07-01 07:37 PM, a y wrote:
What is the difference between fitting a binomial glmm (without random item effects) in the following two ways? 1. Data formatted in the following way: (data_long) ID correct condition itemID 1 1 A i1 1 0 A i2 1 1 A i3 1 1 A i4 2 0 B i1 2 1 B i2 2 1 B i3 2 0 B i4 Fitting a model without item random effects: glmer(correct ~ condition + (1|ID), family = binomial, data = data_long) 2. Data formatted this way (summing over the correct responses): (data_short) ID sum_correct condition itemID 1 3 A NA 2 2 B NA Fitting the following model, assuming there were only 4 items (I've seen dozens of examples like this): glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family = binomial, data = data_short) --- I figured these models should be identical, but in my experience they are very much not. What am I missing? When is the second (more) appropriate? Thanks for any help, Andrew
I believe they should give different likelihoods but identical parameter estimates, *differences* among likelihoods (i.e. among competing models fitted with the same data), etc.. That is, disaggregating the data leads to an extra additive constant in the log-likelihood. I would be very interested to see a counter-example to that statement! In general, the second form should be quicker to fit, provide residuals that are easier to interpret, etc..
I answered my own question, so feel free to disregard this topic.
On Fri, Jul 1, 2016 at 6:37 PM, a y <beermewi at gmail.com> wrote:
What is the difference between fitting a binomial glmm (without random item effects) in the following two ways? 1. Data formatted in the following way: (data_long) ID correct condition itemID 1 1 A i1 1 0 A i2 1 1 A i3 1 1 A i4 2 0 B i1 2 1 B i2 2 1 B i3 2 0 B i4 Fitting a model without item random effects: glmer(correct ~ condition + (1|ID), family = binomial, data = data_long) 2. Data formatted this way (summing over the correct responses): (data_short) ID sum_correct condition itemID 1 3 A NA 2 2 B NA Fitting the following model, assuming there were only 4 items (I've seen dozens of examples like this): glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family = binomial, data = data_short) --- I figured these models should be identical, but in my experience they are very much not. What am I missing? When is the second (more) appropriate? Thanks for any help, Andrew