How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?

On Wed, May 20, 2009 at 12:08 PM, Robert ESPESSER
Dear all,
I ran the model on the data "data" expanded in binary format, and I get the same results tahn
in tabular format( with the correct ? number of observations).
Does it mean ?that the sentence "the code is wrong" is relevant only for the calculs of number of observations ?
Yes, that is what I meant.  Sorry for alarming you.

(Well, actually there is another sense in which the calculations on
the collapsed data give a different result.  The calculated values of
the deviance, AIC and BIC are different for the extended and collapsed
data set and they should be the same.)
Thank you for reassuring me!

# dbin is the dataframe for the data in binary format

summary(dbin)
? ? ? id ? ? ? rep
?3 ? ? ?:15 ? fail: 42
?4 ? ? ?:14 ? suc :101
?1 ? ? ?:13
?9 ? ? ?:11
?10 ? ? :10
?2 ? ? ?:10
?(Other):70

glmer(rep ~ 1 + (1 | id), family="binomial",data=dbin) -> m1.dbin
summary(m1.dbin)
Generalized linear mixed model fit by the Laplace approximation
Formula: rep ~ 1 + (1 | id)
? Data: dbin
? AIC ? BIC logLik deviance
?176.8 182.7 -86.38 ? ?172.8
Random effects:
?Groups Name ? ? ? ?Variance Std.Dev.
?id ? ? (Intercept) 0.16121 ?0.40151
Number of obs: 143, groups: id, 15

Fixed effects:
? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
(Intercept) ? 0.9058 ? ? 0.2139 ? 4.233 ?2.3e-05 ***

----------------------------------------
From: Douglas Bates <bates at stat.wisc.edu>
Sent: Wed May 20 14:24:16 CEST 2009
To: Dale Steele <dale.w.steele at gmail.com>
Subject: Re: [R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?

On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
A recent thread
<https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html>
suggested the glmer can handle data in tabular form. ?I'm working
through (not homework) the simple random effects example from Agresti,
An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304)
and am having problems...

# Data from Table 10.2 n_i: number of free throws, p_i: observed
proportion of successes

id <- seq(1:15)
n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6)
p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889,
.778, .625, .167)
(success <- round(n * p))
fail <- n - success
data <- cbind(success, fail)

# Model to fit: logit(pi_i) = mu_i + alpha

library(lme4)
# ?Model
m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
summary(m1)
nAGQ = 50 is overkill in this case but the model is sufficiently
simple that it doesn't do much harm. ?Gauss-Hermite quadrature without
centering and scaling (which is the "adaptive" part) could require a
huge number of quadrature points because most of the evaluations of
the unscaled density are wasted. ?In adaptive Gauss-Hermite
quadrature, however, you get very close approximations for a small
number of evaluations. ?I would rarely use more than nAGQ = 9. ?(Also,
the number of quadrature points is always chosen to be odd so your 50
will be increased to 51. ?With an odd number of points you get one
evaluation free - the one at a displacement of zero.)

The code runs, and produces results similar, but not exactly what
Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the
'number of obs' be 143 rather than 15? ?Am I doing something wrong?
No, it's the code that is wrong. ?At last summer's UseR Conference
Andrew Gelman said that he was looking at the code for the glm
function and he was frustrated that the code was so long and involved
to be able to handle all the special cases. ?This is a prime example
of why the code gets complicated.

I know that it is convenient and natural to want to list the number of
successes and failures instead of one row for each observation but
doing that requires a massive amount of really ugly code to get it
right. ?(Look at the definitions of the various glm families some day.
?There is this very peculiar "n" object squirreled away in a common
environment of the family functions specifically for this one case -
it is never used otherwise and, in fact, wasn't even defined in other
cases until recently.)

All the structures in lmer/glmer/nlmer are based on the not
unreasonable assumption that the model frame is a model frame, meaning
that there is one observation per row. ?Except that isn't the case
here. ?Getting the count right would mean writing a large amount of
special case code to recognize this case and regenerate the original
number of observations. ?It can be done but it is messy and low on the
priority list right now.

?> summary(m1)
m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
summary(m1)
Generalized linear mixed model fit by the adaptive Gaussian Hermite
approximation
Formula: data ~ 1 + (1 | id)
?AIC ? BIC logLik deviance
?32.8 34.21 ?-14.4 ? ? 28.8
Random effects:
?Groups Name ? ? ? ?Variance Std.Dev.
?id ? ? (Intercept) 0.16258 ?0.40321
Number of obs: 15, groups: id, 15

Fixed effects:
? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
(Intercept) ? 0.9059 ? ? 0.2142 ? ?4.23 2.34e-05 ***

I tried to run the same model by expanding the tabular data. Easy
enough to expand the id variable.
#
(subj <- rep(id,n))

However, I'm stuck on how to expand outcome variable for each player
and concatenate as a single vector...
# first player
oc1 <- rep(c(1,0), sf[1,])

Appreciate any insight anyone may provide on how to use tabular data
directly and how to expand tubular data. ?Thanks! --Dale
I enclose a modified version of your script to do that. ?The general
idea is to produce all the 1 responses, then produce all the zero
responses then rbind them.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?

Thread (4 messages)