Checking modeling assumptions in a binomial GLMM - R-SIG-mixed-models

Thu, Jul 17, 2014 1:39 PM #

On 14-07-17 10:05 AM, Ravi Varadhan wrote:

r-sig-mixed last week, but I did not hear from anyone.  Perhaps, the
  moderator never approved my post.  Hence, the post to r-help.

[cc'ing to r-sig-mixed-models now]

clinical trial (it is actually the schizophrenia trial discussed in
  Hedeker and Gibbons' book on longitudinal analysis).  My impression
  was that diagnostics are quite difficult to do, but was interested
  in seeing if someone had demonstrated this.

handle nAGQ > 1 when there are more than 1 random effects.  I know
  this is a curse of dimensionality problem, but I do not see why it
  cannot handle nAGQ up to 9 for 2-3 dimensions.  Is Laplace's
  approximation sufficiently accurate for multiple random effects?  Is
  mcmcGLMM the way to go for binary GLMM with multiple random effects?


To a large extent AGQ is not implemented for multiple random effects
(or, in lme4 >= 1.0.0, for vector-valued random effects) because we
simply haven't had the time and energy to implement it.  Doug Bates has
long felt/stated that AGQ would be infeasibly slow for multiple random
effects.  To be honest, I don't know if he's basing that on better knowledge
than I (or anyone!) have about the internals of lme4 (e.g. trying to
construct the data structures necessary to do AGQ would lead to a
catastrophic loss of sparsity) or whether it's just that his focus
is usually on gigantic data sets where multi-dimensional AGQ truly
would be infeasible.

  Certainly MCMCglmm, or going outside the R framework (to SAS
PROC GLIMMIX, or Stata's GLLAMM
<http://www.stata-press.com/books/mlmus3_ch10.pdf>), would be my first
resort when worrying about whether AGQ is necessary.
Unfortunately, I know of very little discussion about how to determine
in general whether AGQ is necessary (or what number of quadrature
points is sufficient), without actually doing it -- most of the examples
I've seen (e.g. <http://www.stata-press.com/books/mlmus3_ch10.pdf>
or Breslow 2003) just check by brute force (see
http://rpubs.com/bbolker/glmmchapter for another example).  It would
be nice to figure out a score test, or at least graphical diagnostics,
that could suggest (without actually doing the entire integral) how
much the underlying densities departed from those assumed by the
Laplace approximation.  (The zeta() function in
http://lme4.r-forge.r-project.org/JSS/glmer.Rnw might be a good
starting point ...)

  cheers
    Ben Bolker

Ravi Varadhan

Thu, Jul 17, 2014 2:19 PM #

Thank you very much, Ben.  

I have one more question:  you have function for computing overdispersion, overdisp.glmer() in "RVAideMemoire" package.  This is useful, I suppose.  Why is it not part of lme4, or, equivalently why doesn't glmer() not provide this information?

Thanks,
Ravi

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com] 
Sent: Thursday, July 17, 2014 4:40 PM
To: Ravi Varadhan
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R] Checking modeling assumptions in a binomial GLMM

On 14-07-17 10:05 AM, Ravi Varadhan wrote:

r-sig-mixed last week, but I did not hear from anyone.  Perhaps, the
  moderator never approved my post.  Hence, the post to r-help.

[cc'ing to r-sig-mixed-models now]

clinical trial (it is actually the schizophrenia trial discussed in
  Hedeker and Gibbons' book on longitudinal analysis).  My impression
  was that diagnostics are quite difficult to do, but was interested
  in seeing if someone had demonstrated this.

handle nAGQ > 1 when there are more than 1 random effects. I know
this is a curse of dimensionality problem, but I do not see why it
cannot handle nAGQ up to 9 for 2-3 dimensions. Is Laplace's
approximation sufficiently accurate for multiple random effects? Is
mcmcGLMM the way to go for binary GLMM with multiple random effects?

To a large extent AGQ is not implemented for multiple random effects (or, in lme4 >= 1.0.0, for vector-valued random effects) because we simply haven't had the time and energy to implement it. Doug Bates has long felt/stated that AGQ would be infeasibly slow for multiple random effects. To be honest, I don't know if he's basing that on better knowledge than I (or anyone!) have about the internals of lme4 (e.g. trying to construct the data structures necessary to do AGQ would lead to a catastrophic loss of sparsity) or whether it's just that his focus is usually on gigantic data sets where multi-dimensional AGQ truly would be infeasible.

Certainly MCMCglmm, or going outside the R framework (to SAS PROC GLIMMIX, or Stata's GLLAMM <http://www.stata-press.com/books/mlmus3_ch10.pdf>), would be my first resort when worrying about whether AGQ is necessary.
Unfortunately, I know of very little discussion about how to determine in general whether AGQ is necessary (or what number of quadrature points is sufficient), without actually doing it -- most of the examples I've seen (e.g. <http://www.stata-press.com/books/mlmus3_ch10.pdf>
or Breslow 2003) just check by brute force (see http://rpubs.com/bbolker/glmmchapter for another example). It would be nice to figure out a score test, or at least graphical diagnostics, that could suggest (without actually doing the entire integral) how much the underlying densities departed from those assumed by the Laplace approximation. (The zeta() function in http://lme4.r-forge.r-project.org/JSS/glmer.Rnw might be a good starting point ...)

cheers
Ben Bolker

Ben Bolker

Thu, Jul 17, 2014 4:02 PM #

On 14-07-17 05:19 PM, Ravi Varadhan wrote:

RVAideMemoire is not our package: it's by Maxime Herv?.

We probably didn't add the overdispersion calculation to lme4
because (1) we didn't get around to it; (2) for GLMMs it's an
even-more-approximate estimate of overdispersion than it is
for GLMs; (3) it's easy enough for users to implement themselves
(another version is listed at
http://glmm.wikidot.com/faq#overdispersion_est,
and the aods3::gof() function also does these calculations
(although looking at it, there may be some issues with the
using the results of lme4::deviance() for these purposes -- it returns
something different from the sum of squares of the deviance
residuals ...)

  The summary statement of glmer models probably *should* include this
information.  Feel free to post an issue at
https://github.com/lme4/lme4/issues ...

This somewhat simpler expression replicates the results of
RVAideMemoire's function, although not quite as prettily:

library(lme4)
example(glmer)

c(dev <- sum(residuals(gm1)^2),
  dfr <- df.residual(gm1),
  ratio <- dev/dfr)

RVAideMemoire::overdisp.glmer(gm1)

-----Original Message----- From: Ben Bolker
[mailto:bbolker at gmail.com] Sent: Thursday, July 17, 2014 4:40 PM To:
Ravi Varadhan Cc: r-sig-mixed-models at r-project.org Subject: Re: [R]
Checking modeling assumptions in a binomial GLMM

On 14-07-17 10:05 AM, Ravi Varadhan wrote:

Dear Ben,

Thank you for the helpful response.  I had posted the question to

r-sig-mixed last week, but I did not hear from anyone.  Perhaps, the 
moderator never approved my post.  Hence, the post to r-help.

[cc'ing to r-sig-mixed-models now]

My example has repeated binary (0/1) responses at each visit of a

clinical trial (it is actually the schizophrenia trial discussed in 
Hedeker and Gibbons' book on longitudinal analysis).  My impression 
was that diagnostics are quite difficult to do, but was interested in
seeing if someone had demonstrated this.

I have some related questions: the glmer function in "lme4" does
not

handle nAGQ > 1 when there are more than 1 random effects.  I know 
this is a curse of dimensionality problem, but I do not see why it 
cannot handle nAGQ up to 9 for 2-3 dimensions.  Is Laplace's 
approximation sufficiently accurate for multiple random effects?  Is 
mcmcGLMM the way to go for binary GLMM with multiple random effects?


To a large extent AGQ is not implemented for multiple random effects
(or, in lme4 >= 1.0.0, for vector-valued random effects) because we
simply haven't had the time and energy to implement it.  Doug Bates
has long felt/stated that AGQ would be infeasibly slow for multiple
random effects.  To be honest, I don't know if he's basing that on
better knowledge than I (or anyone!) have about the internals of lme4
(e.g. trying to construct the data structures necessary to do AGQ
would lead to a catastrophic loss of sparsity) or whether it's just
that his focus is usually on gigantic data sets where
multi-dimensional AGQ truly would be infeasible.

Certainly MCMCglmm, or going outside the R framework (to SAS PROC
GLIMMIX, or Stata's GLLAMM
<http://www.stata-press.com/books/mlmus3_ch10.pdf>), would be my
first resort when worrying about whether AGQ is necessary. 
Unfortunately, I know of very little discussion about how to
determine in general whether AGQ is necessary (or what number of
quadrature points is sufficient), without actually doing it -- most
of the examples I've seen (e.g.
<http://www.stata-press.com/books/mlmus3_ch10.pdf> or Breslow 2003)
just check by brute force (see http://rpubs.com/bbolker/glmmchapter
for another example).  It would be nice to figure out a score test,
or at least graphical diagnostics, that could suggest (without
actually doing the entire integral) how much the underlying densities
departed from those assumed by the Laplace approximation.  (The
zeta() function in http://lme4.r-forge.r-project.org/JSS/glmer.Rnw
might be a good starting point ...)

cheers Ben Bolker