Selecting random effects in lme4: ML vs. QAICc - R-SIG-mixed-models

Sun, Aug 15, 2010 1:29 PM #

Hello all,

I am trying to select between two models that differ in their random 
effects. Running a likelihood test gives me a different result than 
using information criteria.

My models:

model.1 <- glmer(Y ~ V1 + V2 + V1:V2 + (V2|SITE), data=Data, family = 
quasipoisson, REML = TRUE)

model.2 <- glmer(Y ~ V1 + V2 + V1:V2 + (1|SITE), data=Data, family = 
quasipoisson, REML = TRUE)

I use quasipoisson because they are highly overdispersed:

lme4:::sigma(model.1)
#5.886659
lme4:::sigma(model.2)
#101.6434

The results of the likelihood test:
(I know that technically one should not use (RE)ML for 
quasi-distributions. However the results are nearly identical whether I 
use quasipoisson or poisson as the family)

anova(model.2, model.1)

#         Df    AIC    BIC   logLik  Chisq Chi Df Pr(>Chisq)
#model.2  9 4648.3 4665.1 -2315.13
#model.1 14  346.8  373.0  -159.39 4311.5      5  < 2.2e-16 ***

Now, I run the same models with a poisson distribution and then adjust 
the AIC by the overdispersion and the number of parameters to obtain 
QAICc. With all these penalties taken into account, model.2 has the 
lowest QAICc. My gut instinct is to go with model.1, however, because 
the overdispersion of model.2 is so high that perhaps it shouldn't even 
be a candidate model. On the other hand, perhaps adjusting the AIC 
really does put the two models on a more level playing field.

Does anyone in the community have any guidance on this matter?

Much appreciated and thanks in advance!

Richard

Richard Feldman, PhD Candidate
Dept. of Biological Sciences, McGill University
W3/5 Stewart Biology Building
1205 Docteur Penfield
Montreal, QC H3A 1B1
514-212-3466
richard.feldman at mail.mcgill.ca

Ben Bolker

Sun, Aug 15, 2010 2:46 PM #

There are a number of issues here, I will comment in-line ...

Richard Feldman wrote:

In general you shouldn't be applying both methods in the same
analysis, except for pedagogical/self-teaching purposes. The two tests
answer different questions, and trying both leads to the temptation to
cherry-pick.

Specifying REML=TRUE has no effect in glmer models.  (I don't know
offhand whether you got a warning, arguably you should have.)  Note that
in ?glmer there is no REML argument specified for glmer -- there is no
warning because it gets swallowed by the optional "..." argument.

If you are going to fit a quasi-likelihood model then generally you
should find sigma for the most complex model (model 1 in your case) and
then apply the same estimated value of sigma for all models.

Clarification: generally one should not use likelihood ratio tests (ML
vs REML is a separate issue) at all on quasi-likelihood fits, although
Venables and Ripley (MASS, p. 210) do suggest using an F test on the
scaled difference in deviance (this is implemented for GLMs with the
'test="F"' option to anova.glm()).  (In contrast, they say in the same
section that one *cannot* use AIC in this case, because one doesn't have
a real likelihood -- presumably they don't agree with the line of
reasoning leading to QAIC.)

   I'm also assuming that your sample size is large enough that the LRT applies (which is often a problem with GLMMs 
if the number of random-effects levels is small)

Besides the fact that you shouldn't do this, it's unlikely that
anova() is correcting for overdispersion.

How did you calculate overdispersion?  If you're going to do this (and
there are still some questions about whether this works) you should use
sigma^2, not sigma, as your estimate of overdispersion.  If you are
(heaven forbid) following the supplementary material of the Bolker et al
2009 TREE article, please look in the worked examples section of
glmm.wikidot.com for an updated and corrected version.

Richard Feldman

Mon, Aug 16, 2010 7:06 AM #

Thank you very much for your informative response.

I guess I'm confused about how to interpret overdispersion. Using the 
Zuur et al. Owls data that you also present in your worked example, I 
ran the following models, keeping the fixed and random effects just as 
you present it:

glmer, family = gaussian
glmer, family = poisson
glmer, family = quasipoisson
glmmPQL, family = quasipoisson
glmmADMB, family = poisson ##zeroInflation=FALSE
glmmADMB, family = negative binomial ##zeroInflation=FALSE
glmmADMB, family = poisson ##zeroInflation=TRUE
glmmADMB, family = negative binomial ##zeroInflation=TRUE


For all but the quasipoisson glmer, I calculated dispersion following 
your example:

 >rdev<-sum(residuals(model)^2)
 >mdf<-length(fixef(model))
 >rdf<-nrow(Data)-mdf
 >rdev/rdf

For the quasipoisson glmer, I extracted dispersion as:

  >lme4:::sigma(model)^2

The results are as follows:

#                          model dispersion
#1                glmer.gaussian  35.534989
#2                    glmer.pois   5.630751
#3 glmer.quasi -- sigma(model)^2  29.830221
#4                 glmmPQL.quasi   1.076906
#5                 glmmADMB.pois   7.585654
#6               glmmADMB.nbinom   1.085072
#7            glmmADMB.pois.zero   8.255389
#8          glmmADMB.nbinom.zero   1.516587

Am I right to interpret this as saying 1) the sigma(model)^2 method is 
inaccurate and 2) the glmmPQL.quasi and glmmADMB.nbinom are sufficiently 
correcting for the overdispersion? I had thought I had read you advising 
using the quasipoisson model to calculate the overdispersion parameter 
(assuming it did so correctly) needed to adjust AIC for QAIC. It would 
seem the dispersion parameter should come from the Poisson regression. 
More likely I just misread you. Also, is it a concern that the 
dispersion is higher in the zero-inflated models? Does this mean 
zero-inflation is not an issue, at least when wanting to calculate AIC 
values?

Again, thank you for all the help!

Richard

Ben Bolker wrote:

  There are a number of issues here, I will comment in-line ...

Richard Feldman wrote:

I am trying to select between two models that differ in their random
effects. Running a likelihood test gives me a different result than
using information criteria.

  In general you shouldn't be applying both methods in the same
analysis, except for pedagogical/self-teaching purposes. The two tests
answer different questions, and trying both leads to the temptation to
cherry-pick.

My models:

model.1 <- glmer(Y ~ V1 + V2 + V1:V2 + (V2|SITE), data=Data, family =
quasipoisson, REML = TRUE)

model.2 <- glmer(Y ~ V1 + V2 + V1:V2 + (1|SITE), data=Data, family =
quasipoisson, REML = TRUE)

  Specifying REML=TRUE has no effect in glmer models.  (I don't know
offhand whether you got a warning, arguably you should have.)  Note that
in ?glmer there is no REML argument specified for glmer -- there is no
warning because it gets swallowed by the optional "..." argument.

I use quasipoisson because they are highly overdispersed:

lme4:::sigma(model.1) #5.886659 lme4:::sigma(model.2) #101.6434

 
  If you are going to fit a quasi-likelihood model then generally you
should find sigma for the most complex model (model 1 in your case) and
then apply the same estimated value of sigma for all models.

The results of the likelihood test: (I know that technically one
should not use (RE)ML for quasi-distributions. However the results
are nearly identical whether I use quasipoisson or poisson as the
family)

  Clarification: generally one should not use likelihood ratio tests (ML
vs REML is a separate issue) at all on quasi-likelihood fits, although
Venables and Ripley (MASS, p. 210) do suggest using an F test on the
scaled difference in deviance (this is implemented for GLMs with the
'test="F"' option to anova.glm()).  (In contrast, they say in the same
section that one *cannot* use AIC in this case, because one doesn't have
a real likelihood -- presumably they don't agree with the line of
reasoning leading to QAIC.)

   I'm also assuming that your sample size is large enough that the LRT applies (which is often a problem with GLMMs 
if the number of random-effects levels is small)

anova(model.2, model.1)

#         Df    AIC    BIC   logLik  Chisq Chi Df Pr(>Chisq) #model.2
9 4648.3 4665.1 -2315.13 #model.1 14  346.8  373.0  -159.39 4311.5
5  < 2.2e-16 ***

  Besides the fact that you shouldn't do this, it's unlikely that
anova() is correcting for overdispersion.

Now, I run the same models with a poisson distribution and then
adjust the AIC by the overdispersion and the number of parameters to
obtain QAICc. With all these penalties taken into account, model.2
has the lowest QAICc. My gut instinct is to go with model.1, however,
because the overdispersion of model.2 is so high that perhaps it
shouldn't even be a candidate model. On the other hand, perhaps
adjusting the AIC really does put the two models on a more level
playing field.

  How did you calculate overdispersion?  If you're going to do this (and
there are still some questions about whether this works) you should use
sigma^2, not sigma, as your estimate of overdispersion.  If you are
(heaven forbid) following the supplementary material of the Bolker et al
2009 TREE article, please look in the worked examples section of
glmm.wikidot.com for an updated and corrected version.

Richard Feldman, PhD Candidate
Dept. of Biological Sciences, McGill University
W3/5 Stewart Biology Building
1205 Docteur Penfield
Montreal, QC H3A 1B1
514-212-3466
richard.feldman at mail.mcgill.ca

Ben Bolker

Tue, Aug 17, 2010 9:42 AM #

Richard Feldman wrote:

Thank you very much for your informative response.

I guess I'm confused about how to interpret overdispersion. Using the
Zuur et al. Owls data that you also present in your worked example, I
ran the following models, keeping the fixed and random effects just as
you present it:

glmer, family = gaussian
glmer, family = poisson
glmer, family = quasipoisson
glmmPQL, family = quasipoisson
glmmADMB, family = poisson ##zeroInflation=FALSE
glmmADMB, family = negative binomial ##zeroInflation=FALSE
glmmADMB, family = poisson ##zeroInflation=TRUE
glmmADMB, family = negative binomial ##zeroInflation=TRUE


For all but the quasipoisson glmer, I calculated dispersion following
your example:

rdev<-sum(residuals(model)^2)
mdf<-length(fixef(model))
rdf<-nrow(Data)-mdf
rdev/rdf

For the quasipoisson glmer, I extracted dispersion as:

 >lme4:::sigma(model)^2

The results are as follows:

#                          model dispersion
#1                glmer.gaussian  35.534989
#2                    glmer.pois   5.630751
#3 glmer.quasi -- sigma(model)^2  29.830221
#4                 glmmPQL.quasi   1.076906
#5                 glmmADMB.pois   7.585654
#6               glmmADMB.nbinom   1.085072
#7            glmmADMB.pois.zero   8.255389
#8          glmmADMB.nbinom.zero   1.516587

Am I right to interpret this as saying 1) the sigma(model)^2 method is
inaccurate and 2) the glmmPQL.quasi and glmmADMB.nbinom are
sufficiently correcting for the overdispersion? I had thought I had
read you advising using the quasipoisson model to calculate the
overdispersion parameter (assuming it did so correctly) needed to
adjust AIC for QAIC. It would seem the dispersion parameter should
come from the Poisson regression. More likely I just misread you.
Also, is it a concern that the dispersion is higher in the
zero-inflated models? Does this mean zero-inflation is not an issue,
at least when wanting to calculate AIC values?

(1) Yes. I would agree that we don't quite know what's going on with
quasi- in glmer, and that using other methods is better if possible:
various people have reported odd results, Doug Bates has gone on record
as saying he wouldn't really know how to interpret a quasi-likelihood
GLMM anyway (I think that's a fair summary of his position), and it's
not clear whether the problem is with bugs in a little-tested corner of
the software or fundamental problems with the definitions of the model.
That said, quasi- is also the easiest way forward ...
  (2) glmmPQL.quasi uses penalized quasi-likelihood, so at least it's
consistent in the way it handles the random effects and the individual
variance structures.  PQL is known to be a bit dicey for data with small
numbers (e.g. means < 5) [Breslow 2003], not that that has stopped lots
of people from using it because for a long time it was the only game in
town. (3) The dispersion approx. 1 in the neg binom models does look
reasonable. See Venables and Ripley's section on overdispersion for some
cautions on this approach ... (4) I don't know why the deviance is
coming out slightly higher for the zero-inflated neg binom -- seems odd.
(5) Since you've gone to all this trouble to fit the overdispersed and
zero-inflated likelihood models, your best bet is to try likelihood
ratio tests between nested models (e.g. #8 vs #6 to test for
zero-inflation, #8 vs #7 or #6 vs #5 to test for overdispersion).  The
quasi- and overdispersion calculations are usually done as a way of
avoiding having to the fit the more complex models at all.