Skip to content

R: RE: Overdispersion and model selection: glmmadmb vs. glmer

6 messages · Paul Johnson, Luca Corlatti

#
Dear Paul, thanks for your answer. Surely I can offer a link to the data, here it is: 
https://www.dropbox.com/s/3i22ovc588kahsk/dataset.csv


obs=observation level
id=animal identity
age=age class of the ind.
mat_be=individual mating behavior (territorial vs non-territorial)
month, year, elevation=when & at which elevation (a.s.l.) the sampling occurred
testosterone, cortisol=testosterone and cortisol levels registered for each individual
hr=home range of each individual (on a monthly basis)
larvae=number of parasite larvae counted (response variable, on a monthly basis)


The key point, here, is that not being statistician (I'm a behavioral ecologist) I don't quite get why 2 methods which should ideally yield similar results, in fact do not. I realize the question may seem to vague, and likely fairly trivial, but I guess other researchers, in my same situation, would simply ask "ok, I have overdispersed data, shall I go for glmer with (1|obs) or glmmadmb?".  


Without getting too complicated (e.g. model selection and averaging), even if you run a very simple model, just to check if the number of counted larvae differ between age classes, the estimates differ between glmmadmb and glmer:


mod1<-glmmadmb(larvae~age+(1|year:month)+(1|id), family="nbinom", zeroInflation=FALSE, data=dataset)
mod2<-glmer(larvae~age+(1|year:month)+(1|id)+(1|obs), family=poisson, data=dataset)



If you try with the full additive models:


mod1<-glmmadmb(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id), family="nbinom", zeroInflation=FALSE, data=dataset)
mod2<-glmer(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id)+(1|obs), family=poisson, data=dataset)



the summary, just as well, gives different results (I used log-transformation as it seems to linearize the relationship with the response variable).


Hope this can help!
Luca
Hi, I'm a different Paul Johnson :) (Had a shock reviewing this thread thinking I'd posted a message that I did not recognize at all).


The question is too vague now.  Aside from agreeing with another Paul Johnson that there are differences between the negative binomial and random-effects interpretations, I can't see where this is leading. The negative binomial model magnifies variance for larger values of the expected outcome, but it does not, by design, include "clustering" of variances. On the other hand, the glmer model you describe is a model of a "clustered" random effect that generates variety among outcomes for a given set of other predictors.  Both can cause "overdispersion" in the sense you notice, but they really are substantively different explanations of it. 
 
How about you show us the estimated models to let us see what's so different, in your view?  

Apart from the residual diagnostics you are concerned about, what about the substantively important parameters and predicted values?  I guess that's a little interesting because, when I last checked, glmer did not have a predict version in the CRAN package (was in testing Rforge version in 2012). 
 
Are the predicted values of outcomes similar in the 2 models? How does it look when you create a scatterplot of the predicted values of the GLMMADMB model versus the glmer model?  


If you post the code that leads to the different diagnostic scatterplots and offer a link to the data, this might be interesting to make into one of the working examples for the mixed models wiki.  I could assign some grad students on this to work it out fully.
On Wed, Aug 28, 2013 at 4:46 AM, Luca Corlatti <luca.corlatti at boku.ac.at> wrote:
Thanks, Paul, it's somewhat consoling to read somebody else had the same thoughts!
 Thanks a lot for your advice. What I miss here, though, is why the results of the model selection differ so much between glmmadmb and glmer, hence what is preferable between the two approaches...
 Cheers,
 Luca
 
 
 
 
 >>> Paul Johnson <paul.johnson at glasgow.ac.uk> 26/08/13 0.29 >>>
 Hi Luca,
 
 I've also seen this difference in residuals from a glmmadmb nbinom2 fit and a lognormal-Poisson fit with glmer. Generally the residuals X fitted plot from glmmadmb looks good (homoscedastic, no trend) while that from the lognormal-Poisson looks wrong (strong curve climbing sharply from negative to positive before plateauing). After some head-scratching I've decided that this isn't a problem.
 
 The reason for the discrepancy is that the fitted values from the lognormal-Poisson fit include the overdispersion random effects, i.e. everything expect the Poisson variation, while the NB fitted values don't, because overdispersion is intrinsic to the NB, not an added random effect. Because the obs-level random effect is there to increase the spread of the Poisson distribution, it's almost inevitable with strong overdispersion that the residuals with low fitted values will be negative while those with high fitted values will be positive.
 
 The way to get a fair comparison between the two is to remove the obs-level random effect from the fitted values of the lognormal-Poisson, leaving just the fixed effects and any other random effects. I find this generally gives a residual pattern much more similar to NB in glmmadmb:
 
   Fitted <- exp(log(fitted(mod)) - ranef(mod)$obs[[1]])
   Resid <- (dat$response - Fitted) / sqrt(Fitted + (Fitted^2) * c(exp(VarCorr(mod)$obs) - 1))
   plot(Fitted, Resid)
 
 # obs is the name of the factor used for the obs-level random effect,
 # mod is the glmer fit of the lognormal-Poisson model, dat$response are the responses, and
 # the bit inside the sqrt() is the variance function for the lognormal-Poisson
 
 Generally I don't find that the model estimates and SEs differ much though.
 
 Best wishes,
 Paul
 
 
 Paul Johnson
 
 Institute of BAH&CM
 Graham Kerr Building
 University of Glasgow
 Glasgow G12 8QQ
 
 http://www.gla.ac.uk/researchinstitutes/bahcm/staff/pauljohnson/
 http://www.stats.gla.ac.uk/~paulj/index.html
#
Hi Luca,

(To avoid confusion, I'm the Glasgow Paul Johnson, not the Kansas one - incidentally there's also a statistician of the same name in Oxford, a former colleague, whom we can only hope stays out of this discussion).

Below is some code for comparing residuals between Poisson-lognormal glmer and NB glmmadmb fits. I find this useful as a quick way of spotting bad fits (aiming, roughly, for homoscedasticity and absence of a trend). However I agree with the other PJ that plotting the predictions and prediction intervals from both models over the data is a better way to compare the two fits.

Best wishes,
Paul


  plot.glmer<-
    function(mer.fit,type="pearson",overdispersion.term=NULL)
    {
      require(AICcmodavg)
      if(is.null(overdispersion.term))
      {
        Fitted<-fitted(mer.fit)
        Residuals=resid(mer.fit,type)
      } else
        {
          response<-model.frame(mer.fit)[[1]]
          od.ranef<-lme4::ranef(mer.fit)[[overdispersion.term]][[1]]
          if(length(response)!=length(od.ranef) || fam.link.mer(mer.fit)$family!="poisson" || fam.link.mer(mer.fit)$link!="log")
            stop("Model is not lognormal-Poisson. Cannot use overdispersion term.")
          Fitted<-exp(log(fitted(mer.fit))-od.ranef)
          Residuals<-(response - Fitted)/sqrt(Fitted+(Fitted^2)*c(exp(lme4::VarCorr(mer.fit)[[overdispersion.term]])-1))
        }
      plot.data<-data.frame(Fitted=Fitted,Residuals=Residuals)
      plot.data$loess.line<-predict(loess(Residuals~Fitted,data=plot.data))
      plot.data<-plot.data[order(plot.data$Fitted),]
      plot(plot.data[,c("Fitted","Residuals")])
      abline(h=0)
      points(plot.data[,c("Fitted","loess.line")],type="l",col="red")
      hist(plot.data$Residuals,xlab="Residuals",main="")
    }

  par(mfrow=c(3,2))
  plot.glmer(pois.glmm)  # overdispersion effects in the fitted values, so expect a trend
  plot.glmer(pois.glmm,overdispersion.term="ons")   # 'detrended' by moving OD effects to the residuals
  plot.glmer(nb.glmm)  # compare with residuals from NB GLMM
On 2 Sep 2013, at 08:32, Luca Corlatti <luca.corlatti at boku.ac.at> wrote:

            
#
Sorry that should be 'obs' not 'ons' for your glmer fit:
plot.glmer(pois.glmm,overdispersion.term="obs")   # 'detrended' by moving OD effects to the residuals
#
PS This paper discusses in detail how to assess Poisson-lognormal model fit by plotting residuals at each heirarchical level: 
Elston, D.A., Moss, R., Boulinier, T., Arrowsmith, C. & Lambin, X. (2001). Analysis of aggregation, a worked example: numbers of ticks on red grouse chicks. Parasitology, 122, 563?9.
http://abdn.ac.uk/lambin-group/Papers/Paper%2041%202001%20Elston%20Tick%20aggregation%20Parasitology.pdf
On 2 Sep 2013, at 11:34, Paul Johnson <Paul.Johnson at glasgow.ac.uk> wrote:

            
#
Thanks, Paul, for the precious info and for the nice piece of code!I
tried to run it using the dataset I posted a few hours ago, and with the
following models:
mod.nb<-glmmadmb(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id),
family="nbinom", zeroInflation=FALSE, data=dataset)
mod.pois<-glmer(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id)+(1|obs),
family=poisson, data=dataset)
and the results are attached to this email as .png file. 
Residuals for log-normal poisson surely look much better than before!
Though, am I right assuming that the glmmadmb plot look a bit more
convincing than the glmer one?
Best wishes, 
Luca
PS This paper discusses in detail how to assess Poisson-lognormal model
fit by plotting residuals at each heirarchical level: 
Elston, D.A., Moss, R., Boulinier, T., Arrowsmith, C. & Lambin, X.
(2001). Analysis of aggregation, a worked example: numbers of ticks on
red grouse chicks. Parasitology, 122, 563?9.
http://abdn.ac.uk/lambin-group/Papers/Paper%2041%202001%20Elston%20Tick%20aggregation%20Parasitology.pdf



On 2 Sep 2013, at 11:34, Paul Johnson <Paul.Johnson at glasgow.ac.uk>
wrote:
- incidentally there's also a statistician of the same name in Oxford, a
former colleague, whom we can only hope stays out of this discussion).
glmer and NB glmmadmb fits. I find this useful as a quick way of
spotting bad fits (aiming, roughly, for homoscedasticity and absence of
a trend). However I agree with the other PJ that plotting the
predictions and prediction intervals from both models over the data is a
better way to compare the two fits.
fam.link.mer(mer.fit)$family!="poisson" ||
fam.link.mer(mer.fit)$link!="log")
overdispersion term.")
Fitted)/sqrt(Fitted+(Fitted^2)*c(exp(lme4::VarCorr(mer.fit)[[overdispersion.term]])-1))
plot.data$loess.line<-predict(loess(Residuals~Fitted,data=plot.data))
so expect a trend
moving OD effects to the residuals
wrote:
data, here it is:
sampling occurred
for each individual
ecologist) I don't quite get why 2 methods which should ideally yield
similar results, in fact do not. I realize the question may seem to
vague, and likely fairly trivial, but I guess other researchers, in my
same situation, would simply ask "ok, I have overdispersed data, shall I
go for glmer with (1|obs) or glmmadmb?".
even if you run a very simple model, just to check if the number of
counted larvae differ between age classes, the estimates differ between
glmmadmb and glmer:
zeroInflation=FALSE, data=dataset)
data=dataset)
mod1<-glmmadmb(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id),
family="nbinom", zeroInflation=FALSE, data=dataset)
mod2<-glmer(larvae~age+mat_be+elevation+log(cortisol)+log(testosterone)+log(hr)+(1|year:month)+(1|id)+(1|obs),
family=poisson, data=dataset)
log-transformation as it seems to linearize the relationship with the
response variable).
thread thinking I'd posted a message that I did not recognize at all).
Johnson that there are differences between the negative binomial and
random-effects interpretations, I can't see where this is leading. The
negative binomial model magnifies variance for larger values of the
expected outcome, but it does not, by design, include "clustering" of
variances. On the other hand, the glmer model you describe is a model of
a "clustered" random effect that generates variety among outcomes for a
given set of other predictors.  Both can cause "overdispersion" in the
sense you notice, but they really are substantively different
explanations of it.
different, in your view?
about the substantively important parameters and predicted values?  I
guess that's a little interesting because, when I last checked, glmer
did not have a predict version in the CRAN package (was in testing
Rforge version in 2012).
does it look when you create a scatterplot of the predicted values of
the GLMMADMB model versus the glmer model?
scatterplots and offer a link to the data, this might be interesting to
make into one of the working examples for the mixed models wiki.  I
could assign some grad students on this to work it out fully.
<luca.corlatti at boku.ac.at> wrote:
same thoughts!
results of the model selection differ so much between glmmadmb and
glmer, hence what is preferable between the two approaches...
fit and a lognormal-Poisson fit with glmer. Generally the residuals X
fitted plot from glmmadmb looks good (homoscedastic, no trend) while
that from the lognormal-Poisson looks wrong (strong curve climbing
sharply from negat>> The reason for the discrepancy is that the fitted values from the
lognormal-Poisson fit include the overdispersion random effects, i.e.
everything expect the Poisson variation, while the NB fitted values
don't, because overdispersion is intrinsic to the NB, not an added
random effect. Because the obs-level random effect is there to increase
the spread of the Poisson distribution, it's almost inevitable with
strong overdispersion that the residuals with low fitted values will be
negative while those with high fitted values will be positive.
obs-level random effect from the fitted values of the lognormal-Poisson,
leaving just the fixed effects and any other random effects. I find this
generally gives a residual pattern much more similar to NB in glmmadmb:
c(exp(VarCorr(mod)$obs) - 1))
are the responses, and
lognormal-Poisson
though.
[r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Corlatti
[luca.corlatti at boku.ac.at]
glmer
of several etho-ecological factors in shaping the emission of parasites
in my study species. My data are counts showing overdispersion. I
therefore fitted my models using the function glmmadmb with
family=nbinom. Visual inspection of residuals (normality,
heteroschedasticity, independence) suggested the global model fitted the
data adequately, and I'm pretty happy with the results of my analysis.
For the sake of curiosity, however, I tried to re-run the model
selection using the function glmer, with family=poisson, adding the
observation-level as a random factor (1|obs) to account for
overdispersion, as recently suggested. In this case, however, visual
inspection of residuals for the global model were not very satisfactory.
After running the model selection, the results were quite different from
those obtained with glmmadmb (not dramatically different, but still...).
Given I have no deep knowledge into th!
for overdispersion? (i.e. is the check of residuals for the global model
the key issue here?)
selections?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glmer vs glmmadmb.png
Type: image/png
Size: 83108 bytes
Desc: Portable Network Graphics Format
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130902/22399ccf/attachment-0001.png>