How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
On Wed, May 20, 2009 at 12:08 PM, Robert ESPESSER
<Robert.Espesser at lpl-aix.fr> wrote:
Dear all, I ran the model on the data "data" expanded in binary format, and I get the same results tahn in tabular format( with the correct ? number of observations). Does it mean ?that the sentence "the code is wrong" is relevant only for the calculs of number of observations ?
Yes, that is what I meant. Sorry for alarming you. (Well, actually there is another sense in which the calculations on the collapsed data give a different result. The calculated values of the deviance, AIC and BIC are different for the extended and collapsed data set and they should be the same.)
Thank you for reassuring me! # dbin is the dataframe for the data in binary format
summary(dbin)
? ? ? id ? ? ? rep ?3 ? ? ?:15 ? fail: 42 ?4 ? ? ?:14 ? suc :101 ?1 ? ? ?:13 ?9 ? ? ?:11 ?10 ? ? :10 ?2 ? ? ?:10 ?(Other):70
glmer(rep ~ 1 + (1 | id), family="binomial",data=dbin) -> m1.dbin summary(m1.dbin)
Generalized linear mixed model fit by the Laplace approximation Formula: rep ~ 1 + (1 | id) ? Data: dbin ? AIC ? BIC logLik deviance ?176.8 182.7 -86.38 ? ?172.8 Random effects: ?Groups Name ? ? ? ?Variance Std.Dev. ?id ? ? (Intercept) 0.16121 ?0.40151 Number of obs: 143, groups: id, 15 Fixed effects: ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ? 0.9058 ? ? 0.2139 ? 4.233 ?2.3e-05 ***
---------------------------------------- From: Douglas Bates <bates at stat.wisc.edu> Sent: Wed May 20 14:24:16 CEST 2009 To: Dale Steele <dale.w.steele at gmail.com> Subject: Re: [R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form? On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
A recent thread <https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html> suggested the glmer can handle data in tabular form. ?I'm working through (not homework) the simple random effects example from Agresti, An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304) and am having problems...
# Data from Table 10.2 n_i: number of free throws, p_i: observed proportion of successes
id <- seq(1:15) n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6) p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889, .778, .625, .167) (success <- round(n * p)) fail <- n - success data <- cbind(success, fail)
# Model to fit: logit(pi_i) = mu_i + alpha
library(lme4) # ?Model m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50) summary(m1)
nAGQ = 50 is overkill in this case but the model is sufficiently simple that it doesn't do much harm. ?Gauss-Hermite quadrature without centering and scaling (which is the "adaptive" part) could require a huge number of quadrature points because most of the evaluations of the unscaled density are wasted. ?In adaptive Gauss-Hermite quadrature, however, you get very close approximations for a small number of evaluations. ?I would rarely use more than nAGQ = 9. ?(Also, the number of quadrature points is always chosen to be odd so your 50 will be increased to 51. ?With an odd number of points you get one evaluation free - the one at a displacement of zero.)
The code runs, and produces results similar, but not exactly what Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the 'number of obs' be 143 rather than 15? ?Am I doing something wrong?
No, it's the code that is wrong. ?At last summer's UseR Conference Andrew Gelman said that he was looking at the code for the glm function and he was frustrated that the code was so long and involved to be able to handle all the special cases. ?This is a prime example of why the code gets complicated. I know that it is convenient and natural to want to list the number of successes and failures instead of one row for each observation but doing that requires a massive amount of really ugly code to get it right. ?(Look at the definitions of the various glm families some day. ?There is this very peculiar "n" object squirreled away in a common environment of the family functions specifically for this one case - it is never used otherwise and, in fact, wasn't even defined in other cases until recently.) All the structures in lmer/glmer/nlmer are based on the not unreasonable assumption that the model frame is a model frame, meaning that there is one observation per row. ?Except that isn't the case here. ?Getting the count right would mean writing a large amount of special case code to recognize this case and regenerate the original number of observations. ?It can be done but it is messy and low on the priority list right now.
?> summary(m1)
m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50) summary(m1)
Generalized linear mixed model fit by the adaptive Gaussian Hermite approximation Formula: data ~ 1 + (1 | id) ?AIC ? BIC logLik deviance ?32.8 34.21 ?-14.4 ? ? 28.8 Random effects: ?Groups Name ? ? ? ?Variance Std.Dev. ?id ? ? (Intercept) 0.16258 ?0.40321 Number of obs: 15, groups: id, 15 Fixed effects: ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ? 0.9059 ? ? 0.2142 ? ?4.23 2.34e-05 *** I tried to run the same model by expanding the tabular data. Easy enough to expand the id variable. # (subj <- rep(id,n)) However, I'm stuck on how to expand outcome variable for each player and concatenate as a single vector... # first player oc1 <- rep(c(1,0), sf[1,]) Appreciate any insight anyone may provide on how to use tabular data directly and how to expand tubular data. ?Thanks! --Dale
I enclose a modified version of your script to do that. ?The general idea is to produce all the 1 responses, then produce all the zero responses then rbind them.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models