Improving residual distribution in glmmTMB
I don't have any perfect solution, and I don't necessarily recommend
the brute-force approach tried below, but a truncated negative binomial
(2/quadratic parameterization) seems best. The genpois and nbinom1 are
actually wonkiest, in terms of appearance of residuals.
The COM-Poisson below is *very* slow, you might want to skip it (it
took 20 minutes to fit on 16 cores, and the results are not that great
anyway).
I thought about implementing a Poisson-inverse-Gamma to allow for
heavier tails, but haven't gotten around to it yet (it's not trivial ...)
There seems to be something wonky about the reported sigma/Pearson
residuals from truncated nbinom2, but the results are OK ...
### Reproducible data and code:
d <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
library(glmmTMB)
library(purrr)
library(broom.mixed)
library(dplyr)
library(DHARMa)
model <- glmmTMB(downs ~ I(ELs/20) +
## Document +
(1|District) +
(1|Document),
family=genpois(),
# ziformula = ~ Document,
# dispformula = ~ Document,
data = d)
## very slow ...
system.time(model2 <- update(model, family = "compois",
control = glmmTMBControl(parallel = 16)))
## user system elapsed
## 17194.076 11.103 1134.367
## truncated genpois doesn't work for some reason ...
fam <- c("genpois", "poisson", "nbinom1", "nbinom2")
fam <- c(fam, paste0("truncated_", fam[-1]))
mod_list <- lapply(fam,
function(f) {
cat(f, "\n")
update(model, family = f)
})
names(mod_list) <- fam
mod_list[["compois"]] <- model2
aictab <- (map_dfr(mod_list, glance, .id = "family")
|> select(-c(nobs,logLik, BIC))
|> mutate(across(c(AIC, deviance), ~ . - min(., na.rm = TRUE)))
|> arrange(AIC)
)
## family sigma AIC df.residual deviance
## <chr> <dbl> <dbl> <int> <dbl>
## 1 truncated_nbinom2 3.66e-8 0 569 NA
## 2 nbinom2 1.39e+0 665. 569 0
## 3 compois 5.29e+9 682. 569 NA
## 4 genpois 1.33e+1 768. 569 NA
## 5 nbinom1 8.11e+0 981. 569 191.
## 6 truncated_poisson 1 e+0 1996. 570 NA
## 7 poisson 1 e+0 2553. 570 2077.
## 8 truncated_nbinom1 7.84e+0 NA 569 NA
tnb <- mod_list[["truncated_nbinom2"]]
## Pearson resid calc messed up ???
plot(resid(tnb,type="pearson") ~ fitted(tnb))
plot(d$downs, predict(tnb, type = "response"), log = "xy")
abline(a=0, b=1, col = 2)
## reorder
mod_list <- mod_list[aictab$family]
op <- par(mfrow = c(3, 3), mfrow = c(3,2,1,1), las = 1)
lapply(mod_list,
function(m) plot(resid(m,type="pearson") ~ fitted(m),
xlab = "", ylab = "", main = family(m)$family)
)
## DHARMa
ss <- simulateResiduals(tnb)
plot(ss)
plotResiduals(ss, form = d$ELs/20)
On 2023-05-13 11:55 p.m., Timothy MacKenzie wrote:
Greetings,
I'm modeling the number of downloads (downs) of certain forms
(Document) by a number of school districts (District) of various
student sizes (ELs).
I've tried genpois(), nbinom1(), and nbinom2() families but the fitted
vs. residual doesn't look promising.
In glmmTMB, are there any additional ways to improve the residual
distribution for count-type data?
Thank you,
Tim M
### Reproducible data and code:
d = read.csv("https://raw.githubusercontent.com/fpqq/w/main/b.csv")
model = glmmTMB(downs ~ I(ELs/20) + Document +
(1|District) +
(1|Document),
family=genpois(),
# ziformula = ~ Document,
# dispformula = ~ Document,
data = d)
plot(resid(model,type="pearson") ~ fitted(model))
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics > E-mail is sent at my convenience; I don't expect replies outside of working hours.