Skip to content

Checking modeling assumptions in a binomial GLMM

3 messages · Ravi Varadhan, Ben Bolker

#
On 14-07-17 10:05 AM, Ravi Varadhan wrote:

            
r-sig-mixed last week, but I did not hear from anyone.  Perhaps, the
  moderator never approved my post.  Hence, the post to r-help.

[cc'ing to r-sig-mixed-models now]
clinical trial (it is actually the schizophrenia trial discussed in
  Hedeker and Gibbons' book on longitudinal analysis).  My impression
  was that diagnostics are quite difficult to do, but was interested
  in seeing if someone had demonstrated this.
handle nAGQ > 1 when there are more than 1 random effects.  I know
  this is a curse of dimensionality problem, but I do not see why it
  cannot handle nAGQ up to 9 for 2-3 dimensions.  Is Laplace's
  approximation sufficiently accurate for multiple random effects?  Is
  mcmcGLMM the way to go for binary GLMM with multiple random effects?


To a large extent AGQ is not implemented for multiple random effects
(or, in lme4 >= 1.0.0, for vector-valued random effects) because we
simply haven't had the time and energy to implement it.  Doug Bates has
long felt/stated that AGQ would be infeasibly slow for multiple random
effects.  To be honest, I don't know if he's basing that on better knowledge
than I (or anyone!) have about the internals of lme4 (e.g. trying to
construct the data structures necessary to do AGQ would lead to a
catastrophic loss of sparsity) or whether it's just that his focus
is usually on gigantic data sets where multi-dimensional AGQ truly
would be infeasible.

  Certainly MCMCglmm, or going outside the R framework (to SAS
PROC GLIMMIX, or Stata's GLLAMM
<http://www.stata-press.com/books/mlmus3_ch10.pdf>), would be my first
resort when worrying about whether AGQ is necessary.
Unfortunately, I know of very little discussion about how to determine
in general whether AGQ is necessary (or what number of quadrature
points is sufficient), without actually doing it -- most of the examples
I've seen (e.g. <http://www.stata-press.com/books/mlmus3_ch10.pdf>
or Breslow 2003) just check by brute force (see
http://rpubs.com/bbolker/glmmchapter for another example).  It would
be nice to figure out a score test, or at least graphical diagnostics,
that could suggest (without actually doing the entire integral) how
much the underlying densities departed from those assumed by the
Laplace approximation.  (The zeta() function in
http://lme4.r-forge.r-project.org/JSS/glmer.Rnw might be a good
starting point ...)

  cheers
    Ben Bolker
#
Thank you very much, Ben.  

I have one more question:  you have function for computing overdispersion, overdisp.glmer() in "RVAideMemoire" package.  This is useful, I suppose.  Why is it not part of lme4, or, equivalently why doesn't glmer() not provide this information?

Thanks,
Ravi

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com] 
Sent: Thursday, July 17, 2014 4:40 PM
To: Ravi Varadhan
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R] Checking modeling assumptions in a binomial GLMM
On 14-07-17 10:05 AM, Ravi Varadhan wrote:

            
r-sig-mixed last week, but I did not hear from anyone.  Perhaps, the
  moderator never approved my post.  Hence, the post to r-help.

[cc'ing to r-sig-mixed-models now]
clinical trial (it is actually the schizophrenia trial discussed in
  Hedeker and Gibbons' book on longitudinal analysis).  My impression
  was that diagnostics are quite difficult to do, but was interested
  in seeing if someone had demonstrated this.
handle nAGQ > 1 when there are more than 1 random effects.  I know
  this is a curse of dimensionality problem, but I do not see why it
  cannot handle nAGQ up to 9 for 2-3 dimensions.  Is Laplace's
  approximation sufficiently accurate for multiple random effects?  Is
  mcmcGLMM the way to go for binary GLMM with multiple random effects?


To a large extent AGQ is not implemented for multiple random effects (or, in lme4 >= 1.0.0, for vector-valued random effects) because we simply haven't had the time and energy to implement it.  Doug Bates has long felt/stated that AGQ would be infeasibly slow for multiple random effects.  To be honest, I don't know if he's basing that on better knowledge than I (or anyone!) have about the internals of lme4 (e.g. trying to construct the data structures necessary to do AGQ would lead to a catastrophic loss of sparsity) or whether it's just that his focus is usually on gigantic data sets where multi-dimensional AGQ truly would be infeasible.

  Certainly MCMCglmm, or going outside the R framework (to SAS PROC GLIMMIX, or Stata's GLLAMM <http://www.stata-press.com/books/mlmus3_ch10.pdf>), would be my first resort when worrying about whether AGQ is necessary.
Unfortunately, I know of very little discussion about how to determine in general whether AGQ is necessary (or what number of quadrature points is sufficient), without actually doing it -- most of the examples I've seen (e.g. <http://www.stata-press.com/books/mlmus3_ch10.pdf>
or Breslow 2003) just check by brute force (see http://rpubs.com/bbolker/glmmchapter for another example).  It would be nice to figure out a score test, or at least graphical diagnostics, that could suggest (without actually doing the entire integral) how much the underlying densities departed from those assumed by the Laplace approximation.  (The zeta() function in http://lme4.r-forge.r-project.org/JSS/glmer.Rnw might be a good starting point ...)

  cheers
    Ben Bolker
#
On 14-07-17 05:19 PM, Ravi Varadhan wrote:
RVAideMemoire is not our package: it's by Maxime Herv?.

We probably didn't add the overdispersion calculation to lme4
because (1) we didn't get around to it; (2) for GLMMs it's an
even-more-approximate estimate of overdispersion than it is
for GLMs; (3) it's easy enough for users to implement themselves
(another version is listed at
http://glmm.wikidot.com/faq#overdispersion_est,
and the aods3::gof() function also does these calculations
(although looking at it, there may be some issues with the
using the results of lme4::deviance() for these purposes -- it returns
something different from the sum of squares of the deviance
residuals ...)

  The summary statement of glmer models probably *should* include this
information.  Feel free to post an issue at
https://github.com/lme4/lme4/issues ...

This somewhat simpler expression replicates the results of
RVAideMemoire's function, although not quite as prettily:

library(lme4)
example(glmer)

c(dev <- sum(residuals(gm1)^2),
  dfr <- df.residual(gm1),
  ratio <- dev/dfr)

RVAideMemoire::overdisp.glmer(gm1)