Skip to content

GLMM, overdispersion, and method for comparing competetive models

6 messages · Ben Bolker, Chad Newbolt, Philipp Singer

#
All,

I would first like to say that I'm a relative novice with R so please take that into consideration with your responses.  Basically, give me the totally dumbed down version of answers when you can.

I have a biological data set with count data that I'm currently analyzing.  Namely, I'm interested in looking at the effects of animal age, bodysize, and antler size on annual male reproductive success (i.e. number of fawns produced).  I would also like to see how the relationships are influenced by changes in population demographics.  I have been using a GLMM to evaluate the following global model:

repro = glmer(Fawn~Age+I(Age^2)+BodySize+SSCM+AvgAge+Age*AvgAge+I(Age^2)*AvgAge+BodySize*AvgAge+SSCM*AvgAge+(1|Sire),data=datum,family=poisson)

where:

Age, BodySize, SSCM are measured characteristics
Fawn = number of fawns produced in a given year
AvgAge = Population demographic factor
(1|Sire) = Random effect for each sampled male ID

I first used the following to evaluate potential overdispersion of my data from the global model:

overdisp_fun <- function(model) {
## number of variance parameters in
##   an n-by-n variance-covariance matrix
vpars <- function(m) {
nrow(m)*(nrow(m)+1)/2
}
model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
rdf <- nrow(model.frame(model))-model.df
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}

With the following result

 repro = glmer(Fawn~Age+I(Age^2)+BodySize+SSCM+AvgAge+Age*AvgAge+I(Age^2)*AvgAge+BodySize*AvgAge+SSCM*AvgAge+(1|Sire),data=datum,family=poisson)
overdisp_fun(repro)
 chisq                             ratio                          rdf                              p
1.698574e+02      1.681756e+00       1.010000e+02          2.169243e-05

Since the ratio of Pearson-statistic to rdf is 1.68 I assume that I need to take this overdispersion into account

My first inclination was to use quasipoisson distribution to account for overdispersion; however, I see that in no longer available in lme4.  I used glmmPQL in the MASS package with quasipoisson but do not receive AICc information.  I had planned on using AICc to evaluate competitive models.  My specific question is: 1) is there a way to generate the necessary information (AICc or something like) to compare competitive models from overdispersed data in a current R environment? I have read https://cran.r-project.org/web/packages/bbmle/vignettes/quasi.pdf but I'm having a difficult time understanding exactly how to implement from a technical perspective.  I'm on the path of trying to use a negative binomial (I'm not locked into this method so please provide insight if appropriate) with package glmmADMB: however, I have been unable to get this package to load successfully.  I've followed the instructions to the best of my understanding and abilities but cannot figure out where I'm going wrong.  Any advice is much appreciated as I'm totally stumped right now on many fronts.  I'm running windows 7 on 64-bit machine.  Here is what I have attempted with output:

install.packages("glmmADMB",
+     repos=c("http://glmmadmb.r-forge.r-project.org/repos",
+             getOption("repos")),
+     type="source")
Installing package into ?C:/Users/newboch/Documents/R/win-library/3.3?
(as ?lib? is unspecified)
trying URL 'http://glmmadmb.r-forge.r-project.org/repos/src/contrib/glmmADMB_0.8.3.3.tar.gz'
Content type 'application/x-gzip' length 9391177 bytes (9.0 MB)
downloaded 9.0 MB
* installing *source* package 'glmmADMB' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
  there is no package called 'stringi'
ERROR: lazy loading failed for package 'glmmADMB'
* removing 'C:/Users/newboch/Documents/R/win-library/3.3/glmmADMB'
The downloaded source packages are in
        ?C:\Users\newboch\AppData\Local\Temp\RtmpK23VOM\downloaded_packages?
Warning messages:
1: running command '"C:/PROGRA~1/R/R-33~1.1/bin/x64/R" CMD INSTALL -l "C:\Users\newboch\Documents\R\win-library\3.3" C:\Users\newboch\AppData\Local\Temp\RtmpK23VOM/downloaded_packages/glmmADMB_0.8.3.3.tar.gz' had status 1
2: In install.packages("glmmADMB", repos = c("http://glmmadmb.r-forge.r-project.org/repos",  :
  installation of package ?glmmADMB? had non-zero exit status
Error in loadNamespace(name) : there is no package called ?glmmADMB?
Error in loadNamespace(name) : there is no package called ?glmmADMB?
Installing package into ?C:/Users/newboch/Documents/R/win-library/3.3?
(as ?lib? is unspecified)
Warning message:
package ?glmmADMB? is not available (for R version 3.3.1)
Thanks,

Chad
#
A couple of quick comments:

  (1) try

install.packages(c("R2admb","stringr","plyr","coda"))

  before doing the glmmADMB installation.

  (2) more advice about overdispersion is available at
http://tinyurl.com/glmmFAQ#Overdispersion

 ...
On 16-06-24 04:02 PM, Chad Newbolt wrote:
overdisp_fun(repro)
Content type 'application/x-gzip' length 9391177 bytes (9.0 MB)
#
Thanks.  I'll  take this into consideration and get back to everyone.  Please disregard the other posting that was sent out today containing the exact same content.  I accidentally posted twice during the new member enrolling process.  Sorry and thanks again for the help.

Chad
1 day later
#
I successfully loaded the glmmADMB package using the code below so thanks for that bit of help.  

Since I have evidence for overdispersion, I'm using negative binomial distribution as opposed to Poisson.

My two questions are:

1) When I fit using the following global zero inflation model I receive the following error:

fit1=glmmadmb(Fawn~Age+I(Age^2)+BodySize+SSCM+AvgAge+Age*AvgAge+I(Age^2)*AvgAge+BodySize*AvgAge+SSCM*AvgAge+(1|Sire),data=datum,family="nbinom",zeroInflation = TRUE)
Parameters were estimated, but standard errors were not: the most likely problem is that the curvature at MLE was zero or negative
Error in glmmadmb(Fawn ~ Age + I(Age^2) + BodySize + SSCM + AvgAge + Age *  : 
  The function maximizer failed (couldn't find parameter file) Troubleshooting steps include (1) run with 'save.dir' set and inspect output files; (2) change run parameters: see '?admbControl';(3) re-run with debug=TRUE for more information on failure mode
In addition: Warning message:
running command 'C:\windows\system32\cmd.exe /c glmmadmb -maxfn 500 -maxph 5 -noinit -shess' had status 1

However, when I change to zeroInflation = FALSE, I receive no warnings and everything seems to go as should.

Does this simply mean that my data is not zero inflated, hence the zero inflated model will not run, or is this something I should be concerned about and investigate the cause further?  When I debug   I see the following warning....Warning -- Hessian does not appear to be positive definite Hessian does not appear to be positive definite. 


2) When fitting more simple versions(predictors removed) I receive the same error as above when using the family=nbinom;  however these errors disappear when using family=nbinom1.  Is this indicative of an underlying problem or am I OK to use the ouput from the later family where variance = ??.

Thanks,

Chad
2 days later
#
Just reaching out one last time to see if anyone has any input regarding the question below.  I'm a bit stuck.  Thanks again for the previous help and I'll explore other outlets for future questions related to this topic if no help is available.

Thanks,

Chad
#
You can try using https://github.com/glmmTMB/glmmTMB with a negative
binomial. Otherwise you can also use a random observation level effect as
explained in the faq that ben linked.

Generally you already checked for overdispersion indicating that you should
consider it. You can also compare models with anova or AIC/BIC.
On Jun 30, 2016 17:22, "Chad Newbolt" <newboch at auburn.edu> wrote: