Hi all,
I'm trying to model proportion data using a binomial GLMM. I first
started with lme4 and saw that my data were overdispersed.
LME4Inter <- glmer(PropInter ~ Manip + Observer? +(1|Date),
family=binomial, weights = NbAg, data = senior,
control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
Using the dispersion_glmer (provided by Ben Bolker, see below for code),
I got a dispersion parameter of 1.99 (as expected the outcome is the
same with glmmADMB).
I then tried to add an observation level random effect, and the
dispersion parameter dropped to 0.24. With such a small dispersion
parameter, my model becomes overly conservative and I'll probably not be
able to detect any (true) effect (if I understand correctly). Would you
stick to this model anyways?
I started investigating beta-binomial models to see if they could better
model overdispersion (and not be too conservative) and wanted to use
glmmTMB to do that.
I first tried to refit the binomial GLMM with glmmTMB and calculate the
dispersion parameter using the dispfun function (also provided by Ben
Bolker, see below for code), and got a very different response: the
dispersion parameter was 0.514. If this is correct, I wouldn't have to
worry about overdispersion.
TMBInter <- glmmTMB(PropInter ~ Manip + Observer? +(1|Date),
family=binomial, weights = NbAg, data = senior)
The estimates and SEs were identical between the 3 packages, so I am
fitting correctly the exact same models. However, the residuals between
glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
I tried to run Poisson GLMMs with the 2 packages and got identical
dispersion parameters, so the dispfun function works well for glmmTMB too.
Which dispersion parameter value should I trust for the binomial GLMM?
Is there something specific about binomial GLMMs in glmmTMB?
Thanks in advance for your help!
Thomas Merkling
dispersion_glmer <- function(model) {
? ## number of variance parameters in
? ##?? an n-by-n variance-covariance matrix
? vpars <- function(m) {
??? nrow(m)*(nrow(m)+1)/2
? }
? model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
? rdf <- nrow(model.frame(model))-model.df
? rp <- residuals(model,type="pearson")
? Pearson.chisq <- sum(rp^2)
? prat <- Pearson.chisq/rdf
? pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
? c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
dispfun <- function(m) {
? r <- residuals(m,type="pearson")
? n <- df.residual(m)
? dsq <- sum(r^2)
? c(dsq=dsq,n=n,disp=dsq/n)
}
different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
5 messages · Thomas Merkling, Muldoon, Ariel, Ben Bolker
Hi, Thomas-
After doing a little poking around, my impression is that the glmmTMB model may not be returning the correct values for the residuals for a binomial model.
I fit the model from the "Examples" in both glmmTMB (development version) and lme4, using the two different coding approaches that are often used for fitting binomial models (one with a matrix response and one with a proportion plus number of trials as weights).
The two lme4 models return identical results, so there wasn't much reason to pursue them both. The second model (tmbm2) is like the model you are fitting.
library(glmmTMB)
library(lme4)
tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
data = cbpp, weights = size, family = binomial)
tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm4 = glmer(incidence/size ~ period + (1 | herd),
data = cbpp, family = binomial, weights = size)
Drilling down for reasons for residuals to be different, I found that the second model returns different fitted values than the others.
fitted(tmbm1)
fitted(tmbm2)
fitted(tmbm4)
I can't tell you why this is and I can't be certain the values from the other models are correct, but if I manually calculate the fitted values for the second model the results match the values from the other three models. (You'll see slight differences between the lme4 and glmmTMB models due to slight differences in estimates of the conditional modes of the random effects.)
# Manual calculation of fitted values
plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
rep(unlist(ranef(tmbm2)$cond),
times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)) )
A good approach then might be to use the matrix-style response variable in glmmTMB. However, the residuals.glmmTMB function then uses the matrix response when calculating residuals instead of a proportion response.
# Calculate response residuals for matrix response
residuals(tmbm1)
model.response(tmbm1$frame) - fitted(tmbm1)
# Here is what these should be, which matches lme4
model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
getME(tmbm4, "y") - fitted(tmbm4)
The last difference I found is in the calculation of the pearson residuals. I believe glmmTMB uses mu*(1-mu) to calculate the variance that is used in the divisor (I base this on family(tmbm1)$variance). However, I think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the binomial sample size/number of trials).
residuals(tmbm4, type = "pearson")
( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt( fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
I'm not sure any of this helps you decide how to move forward, but I definitely wouldn't dismiss the overdispersion you are seeing in your lme4 model yet.
Ariel
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas Merkling
Sent: Thursday, March 15, 2018 6:51 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
Hi all,
I'm trying to model proportion data using a binomial GLMM. I first started with lme4 and saw that my data were overdispersed.
LME4Inter <- glmer(PropInter ~ Manip + Observer? +(1|Date), family=binomial, weights = NbAg, data = senior, control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
Using the dispersion_glmer (provided by Ben Bolker, see below for code), I got a dispersion parameter of 1.99 (as expected the outcome is the same with glmmADMB).
I then tried to add an observation level random effect, and the dispersion parameter dropped to 0.24. With such a small dispersion parameter, my model becomes overly conservative and I'll probably not be able to detect any (true) effect (if I understand correctly). Would you stick to this model anyways?
I started investigating beta-binomial models to see if they could better model overdispersion (and not be too conservative) and wanted to use glmmTMB to do that.
I first tried to refit the binomial GLMM with glmmTMB and calculate the dispersion parameter using the dispfun function (also provided by Ben Bolker, see below for code), and got a very different response: the dispersion parameter was 0.514. If this is correct, I wouldn't have to worry about overdispersion.
TMBInter <- glmmTMB(PropInter ~ Manip + Observer? +(1|Date), family=binomial, weights = NbAg, data = senior)
The estimates and SEs were identical between the 3 packages, so I am fitting correctly the exact same models. However, the residuals between glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
I tried to run Poisson GLMMs with the 2 packages and got identical dispersion parameters, so the dispfun function works well for glmmTMB too.
Which dispersion parameter value should I trust for the binomial GLMM?
Is there something specific about binomial GLMMs in glmmTMB?
Thanks in advance for your help!
Thomas Merkling
dispersion_glmer <- function(model) {
? ## number of variance parameters in
? ##?? an n-by-n variance-covariance matrix
? vpars <- function(m) {
??? nrow(m)*(nrow(m)+1)/2
? }
? model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
? rdf <- nrow(model.frame(model))-model.df
? rp <- residuals(model,type="pearson")
? Pearson.chisq <- sum(rp^2)
? prat <- Pearson.chisq/rdf
? pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
? c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
dispfun <- function(m) {
? r <- residuals(m,type="pearson")
? n <- df.residual(m)
? dsq <- sum(r^2)
? c(dsq=dsq,n=n,disp=dsq/n)
}
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ariel, can you post this as an issue at https://github.com/glmmTMB/glmmTMB/issues ? This stuff is really hard to get right ... Thomas: I think an important point is that it really only makes much sense to focus on the dispersion in a model that does *NOT* already have a dispersion parameter estimated. If you want to choose among models with different forms of dispersion, I would probably suggest AIC (which reduces to comparing log-likelihood for models with the same number of parameters -- if the two models aren't nested, as they probably aren't, you can't do a statistical test, but you can still see which one fits the data better). On Fri, Mar 16, 2018 at 12:31 PM, Muldoon, Ariel
<Ariel.Muldoon at oregonstate.edu> wrote:
Hi, Thomas-
After doing a little poking around, my impression is that the glmmTMB model may not be returning the correct values for the residuals for a binomial model.
I fit the model from the "Examples" in both glmmTMB (development version) and lme4, using the two different coding approaches that are often used for fitting binomial models (one with a matrix response and one with a proportion plus number of trials as weights).
The two lme4 models return identical results, so there wasn't much reason to pursue them both. The second model (tmbm2) is like the model you are fitting.
library(glmmTMB)
library(lme4)
tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
data = cbpp, weights = size, family = binomial)
tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm4 = glmer(incidence/size ~ period + (1 | herd),
data = cbpp, family = binomial, weights = size)
Drilling down for reasons for residuals to be different, I found that the second model returns different fitted values than the others.
fitted(tmbm1)
fitted(tmbm2)
fitted(tmbm4)
I can't tell you why this is and I can't be certain the values from the other models are correct, but if I manually calculate the fitted values for the second model the results match the values from the other three models. (You'll see slight differences between the lme4 and glmmTMB models due to slight differences in estimates of the conditional modes of the random effects.)
# Manual calculation of fitted values
plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
rep(unlist(ranef(tmbm2)$cond),
times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)) )
A good approach then might be to use the matrix-style response variable in glmmTMB. However, the residuals.glmmTMB function then uses the matrix response when calculating residuals instead of a proportion response.
# Calculate response residuals for matrix response
residuals(tmbm1)
model.response(tmbm1$frame) - fitted(tmbm1)
# Here is what these should be, which matches lme4
model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
getME(tmbm4, "y") - fitted(tmbm4)
The last difference I found is in the calculation of the pearson residuals. I believe glmmTMB uses mu*(1-mu) to calculate the variance that is used in the divisor (I base this on family(tmbm1)$variance). However, I think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the binomial sample size/number of trials).
residuals(tmbm4, type = "pearson")
( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt( fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
I'm not sure any of this helps you decide how to move forward, but I definitely wouldn't dismiss the overdispersion you are seeing in your lme4 model yet.
Ariel
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas Merkling
Sent: Thursday, March 15, 2018 6:51 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
Hi all,
I'm trying to model proportion data using a binomial GLMM. I first started with lme4 and saw that my data were overdispersed.
LME4Inter <- glmer(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior, control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
Using the dispersion_glmer (provided by Ben Bolker, see below for code), I got a dispersion parameter of 1.99 (as expected the outcome is the same with glmmADMB).
I then tried to add an observation level random effect, and the dispersion parameter dropped to 0.24. With such a small dispersion parameter, my model becomes overly conservative and I'll probably not be able to detect any (true) effect (if I understand correctly). Would you stick to this model anyways?
I started investigating beta-binomial models to see if they could better model overdispersion (and not be too conservative) and wanted to use glmmTMB to do that.
I first tried to refit the binomial GLMM with glmmTMB and calculate the dispersion parameter using the dispfun function (also provided by Ben Bolker, see below for code), and got a very different response: the dispersion parameter was 0.514. If this is correct, I wouldn't have to worry about overdispersion.
TMBInter <- glmmTMB(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior)
The estimates and SEs were identical between the 3 packages, so I am fitting correctly the exact same models. However, the residuals between glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
I tried to run Poisson GLMMs with the 2 packages and got identical dispersion parameters, so the dispfun function works well for glmmTMB too.
Which dispersion parameter value should I trust for the binomial GLMM?
Is there something specific about binomial GLMMs in glmmTMB?
Thanks in advance for your help!
Thomas Merkling
dispersion_glmer <- function(model) {
## number of variance parameters in
## an n-by-n variance-covariance matrix
vpars <- function(m) {
nrow(m)*(nrow(m)+1)/2
}
model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
rdf <- nrow(model.frame(model))-model.df
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
dispfun <- function(m) {
r <- residuals(m,type="pearson")
n <- df.residual(m)
dsq <- sum(r^2)
c(dsq=dsq,n=n,disp=dsq/n)
}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks Ariel and Ben for your answers!
Following Ariel's troubleshooting, I managed to modify the function I
was using to match the results from lme4:
dispfun <- function(m, trials) {
# where m is the glmmTMB model and trials a vector of number of trials
for each observation
? r <- (model.response(m$frame)[,1]/trials -
fitted(m))/sqrt(fitted(m)*(1-fitted(m))/trials)
? n <- df.residual(m)
? dsq <- sum(r^2)
? c(dsq=dsq,n=n,disp=dsq/n)
}
Ben, sorry but I am bit confused by your answer. If I understand
correctly, the approach you would recommend is to calculate the
dispersion parameter on the binomial model and if there is
overdispersion compare models with different ways to deal with it (e.g
observation-level random effects and beta-binomial) to the binomial one
to find out which ones fits the data better. Is that correct? And so
there would be no point in calculating the dispersion parameter for the
OLRE and beta-binomial model and see how much it goes down?
Cheers,
Thomas
On 16/03/2018 14:19, Ben Bolker wrote:
Ariel, can you post this as an issue at https://github.com/glmmTMB/glmmTMB/issues ? This stuff is really hard to get right ... Thomas: I think an important point is that it really only makes much sense to focus on the dispersion in a model that does *NOT* already have a dispersion parameter estimated. If you want to choose among models with different forms of dispersion, I would probably suggest AIC (which reduces to comparing log-likelihood for models with the same number of parameters -- if the two models aren't nested, as they probably aren't, you can't do a statistical test, but you can still see which one fits the data better). On Fri, Mar 16, 2018 at 12:31 PM, Muldoon, Ariel <Ariel.Muldoon at oregonstate.edu> wrote:
Hi, Thomas-
After doing a little poking around, my impression is that the glmmTMB model may not be returning the correct values for the residuals for a binomial model.
I fit the model from the "Examples" in both glmmTMB (development version) and lme4, using the two different coding approaches that are often used for fitting binomial models (one with a matrix response and one with a proportion plus number of trials as weights).
The two lme4 models return identical results, so there wasn't much reason to pursue them both. The second model (tmbm2) is like the model you are fitting.
library(glmmTMB)
library(lme4)
tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
data = cbpp, weights = size, family = binomial)
tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm4 = glmer(incidence/size ~ period + (1 | herd),
data = cbpp, family = binomial, weights = size)
Drilling down for reasons for residuals to be different, I found that the second model returns different fitted values than the others.
fitted(tmbm1)
fitted(tmbm2)
fitted(tmbm4)
I can't tell you why this is and I can't be certain the values from the other models are correct, but if I manually calculate the fitted values for the second model the results match the values from the other three models. (You'll see slight differences between the lme4 and glmmTMB models due to slight differences in estimates of the conditional modes of the random effects.)
# Manual calculation of fitted values
plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
rep(unlist(ranef(tmbm2)$cond),
times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)) )
A good approach then might be to use the matrix-style response variable in glmmTMB. However, the residuals.glmmTMB function then uses the matrix response when calculating residuals instead of a proportion response.
# Calculate response residuals for matrix response
residuals(tmbm1)
model.response(tmbm1$frame) - fitted(tmbm1)
# Here is what these should be, which matches lme4
model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
getME(tmbm4, "y") - fitted(tmbm4)
The last difference I found is in the calculation of the pearson residuals. I believe glmmTMB uses mu*(1-mu) to calculate the variance that is used in the divisor (I base this on family(tmbm1)$variance). However, I think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the binomial sample size/number of trials).
residuals(tmbm4, type = "pearson")
( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt( fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
I'm not sure any of this helps you decide how to move forward, but I definitely wouldn't dismiss the overdispersion you are seeing in your lme4 model yet.
Ariel
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas Merkling
Sent: Thursday, March 15, 2018 6:51 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM in lme4, glmmADMB and glmmTMB
Hi all,
I'm trying to model proportion data using a binomial GLMM. I first started with lme4 and saw that my data were overdispersed.
LME4Inter <- glmer(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior, control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
Using the dispersion_glmer (provided by Ben Bolker, see below for code), I got a dispersion parameter of 1.99 (as expected the outcome is the same with glmmADMB).
I then tried to add an observation level random effect, and the dispersion parameter dropped to 0.24. With such a small dispersion parameter, my model becomes overly conservative and I'll probably not be able to detect any (true) effect (if I understand correctly). Would you stick to this model anyways?
I started investigating beta-binomial models to see if they could better model overdispersion (and not be too conservative) and wanted to use glmmTMB to do that.
I first tried to refit the binomial GLMM with glmmTMB and calculate the dispersion parameter using the dispfun function (also provided by Ben Bolker, see below for code), and got a very different response: the dispersion parameter was 0.514. If this is correct, I wouldn't have to worry about overdispersion.
TMBInter <- glmmTMB(PropInter ~ Manip + Observer +(1|Date), family=binomial, weights = NbAg, data = senior)
The estimates and SEs were identical between the 3 packages, so I am fitting correctly the exact same models. However, the residuals between glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
I tried to run Poisson GLMMs with the 2 packages and got identical dispersion parameters, so the dispfun function works well for glmmTMB too.
Which dispersion parameter value should I trust for the binomial GLMM?
Is there something specific about binomial GLMMs in glmmTMB?
Thanks in advance for your help!
Thomas Merkling
dispersion_glmer <- function(model) {
## number of variance parameters in
## an n-by-n variance-covariance matrix
vpars <- function(m) {
nrow(m)*(nrow(m)+1)/2
}
model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
rdf <- nrow(model.frame(model))-model.df
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
dispfun <- function(m) {
r <- residuals(m,type="pearson")
n <- df.residual(m)
dsq <- sum(r^2)
c(dsq=dsq,n=n,disp=dsq/n)
}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Fri, Mar 16, 2018 at 4:39 PM, Thomas Merkling
<thomasmerkling00 at gmail.com> wrote:
Thanks Ariel and Ben for your answers!
Following Ariel's troubleshooting, I managed to modify the function I was
using to match the results from lme4:
dispfun <- function(m, trials) {
# where m is the glmmTMB model and trials a vector of number of trials for
each observation
r <- (model.response(m$frame)[,1]/trials -
fitted(m))/sqrt(fitted(m)*(1-fitted(m))/trials)
n <- df.residual(m)
dsq <- sum(r^2)
c(dsq=dsq,n=n,disp=dsq/n)
}
Ben, sorry but I am bit confused by your answer. If I understand correctly,
the approach you would recommend is to calculate the dispersion parameter on
the binomial model and if there is overdispersion compare models with
different ways to deal with it (e.g observation-level random effects and
beta-binomial) to the binomial one to find out which ones fits the data
better. Is that correct? And so there would be no point in calculating the
dispersion parameter for the OLRE and beta-binomial model and see how much
it goes down?
Yes, that's what I'm recommending. ("If there is overdispersion" is
open to interpretation --
i.e. how much overdispersion makes it concerning/worth bothering to
move forward with
a model that takes overdispersion into account ...)
As always, happy to hear other opinions.
On 16/03/2018 14:19, Ben Bolker wrote:
Ariel, can you post this as an issue at https://github.com/glmmTMB/glmmTMB/issues ? This stuff is really hard to get right ... Thomas: I think an important point is that it really only makes much sense to focus on the dispersion in a model that does *NOT* already have a dispersion parameter estimated. If you want to choose among models with different forms of dispersion, I would probably suggest AIC (which reduces to comparing log-likelihood for models with the same number of parameters -- if the two models aren't nested, as they probably aren't, you can't do a statistical test, but you can still see which one fits the data better). On Fri, Mar 16, 2018 at 12:31 PM, Muldoon, Ariel <Ariel.Muldoon at oregonstate.edu> wrote:
Hi, Thomas-
After doing a little poking around, my impression is that the glmmTMB
model may not be returning the correct values for the residuals for a
binomial model.
I fit the model from the "Examples" in both glmmTMB (development version)
and lme4, using the two different coding approaches that are often used for
fitting binomial models (one with a matrix response and one with a
proportion plus number of trials as weights).
The two lme4 models return identical results, so there wasn't much reason
to pursue them both. The second model (tmbm2) is like the model you are
fitting.
library(glmmTMB)
library(lme4)
tmbm1 = glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm2 = glmmTMB(incidence/size ~ period + (1 | herd),
data = cbpp, weights = size, family = binomial)
tmbm3 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
tmbm4 = glmer(incidence/size ~ period + (1 | herd),
data = cbpp, family = binomial, weights = size)
Drilling down for reasons for residuals to be different, I found that the
second model returns different fitted values than the others.
fitted(tmbm1)
fitted(tmbm2)
fitted(tmbm4)
I can't tell you why this is and I can't be certain the values from the
other models are correct, but if I manually calculate the fitted values for
the second model the results match the values from the other three models.
(You'll see slight differences between the lme4 and glmmTMB models due to
slight differences in estimates of the conditional modes of the random
effects.)
# Manual calculation of fitted values
plogis( (getME(tmbm2, "X") %*% fixef(tmbm2)$cond) +
rep(unlist(ranef(tmbm2)$cond),
times = c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L,
4L, 4L, 4L, 4L)) )
A good approach then might be to use the matrix-style response variable
in glmmTMB. However, the residuals.glmmTMB function then uses the matrix
response when calculating residuals instead of a proportion response.
# Calculate response residuals for matrix response
residuals(tmbm1)
model.response(tmbm1$frame) - fitted(tmbm1)
# Here is what these should be, which matches lme4
model.response(tmbm1$frame)[,1]/cbpp$size - fitted(tmbm1)
getME(tmbm4, "y") - fitted(tmbm4)
The last difference I found is in the calculation of the pearson
residuals. I believe glmmTMB uses mu*(1-mu) to calculate the variance that
is used in the divisor (I base this on family(tmbm1)$variance). However, I
think lme4 uses the variance of a proportion, mu*(1-mu)/m (where m is the
binomial sample size/number of trials).
residuals(tmbm4, type = "pearson")
( getME(tmbm4, "y") - fitted(tmbm4) )/sqrt(
fitted(tmbm3)*(1-fitted(tmbm3))/cbpp$size )
I'm not sure any of this helps you decide how to move forward, but I
definitely wouldn't dismiss the overdispersion you are seeing in your lme4
model yet.
Ariel
-----Original Message-----
From: R-sig-mixed-models
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thomas
Merkling
Sent: Thursday, March 15, 2018 6:51 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] different overdispersion parameter for binomial GLMM
in lme4, glmmADMB and glmmTMB
Hi all,
I'm trying to model proportion data using a binomial GLMM. I first
started with lme4 and saw that my data were overdispersed.
LME4Inter <- glmer(PropInter ~ Manip + Observer +(1|Date),
family=binomial, weights = NbAg, data = senior,
control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun=100000)))
Using the dispersion_glmer (provided by Ben Bolker, see below for code),
I got a dispersion parameter of 1.99 (as expected the outcome is the same
with glmmADMB).
I then tried to add an observation level random effect, and the
dispersion parameter dropped to 0.24. With such a small dispersion
parameter, my model becomes overly conservative and I'll probably not be
able to detect any (true) effect (if I understand correctly). Would you
stick to this model anyways?
I started investigating beta-binomial models to see if they could better
model overdispersion (and not be too conservative) and wanted to use glmmTMB
to do that.
I first tried to refit the binomial GLMM with glmmTMB and calculate the
dispersion parameter using the dispfun function (also provided by Ben
Bolker, see below for code), and got a very different response: the
dispersion parameter was 0.514. If this is correct, I wouldn't have to worry
about overdispersion.
TMBInter <- glmmTMB(PropInter ~ Manip + Observer +(1|Date),
family=binomial, weights = NbAg, data = senior)
The estimates and SEs were identical between the 3 packages, so I am
fitting correctly the exact same models. However, the residuals between
glmer/glmmADMB and glmmTMB models were quite different. Is that expected?
I tried to run Poisson GLMMs with the 2 packages and got identical
dispersion parameters, so the dispfun function works well for glmmTMB too.
Which dispersion parameter value should I trust for the binomial GLMM?
Is there something specific about binomial GLMMs in glmmTMB?
Thanks in advance for your help!
Thomas Merkling
dispersion_glmer <- function(model) {
## number of variance parameters in
## an n-by-n variance-covariance matrix
vpars <- function(m) {
nrow(m)*(nrow(m)+1)/2
}
model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
rdf <- nrow(model.frame(model))-model.df
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
dispfun <- function(m) {
r <- residuals(m,type="pearson")
n <- df.residual(m)
dsq <- sum(r^2)
c(dsq=dsq,n=n,disp=dsq/n)
}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models