If, for a given built glmm "mod", I don't want to use an available tool to check out (over or under) dispersion, with which variance should I compare the total variance explained by mod? In advance, thanks for your replies. Kind regards,
glmms:Checking out over_under_dispersion
4 messages · C. AMAL D. GLELE, Ben Bolker, John Maindonald
The standard advice is to compare either the residual deviance or the sum of squares of the Pearson residuals to the residual degrees of freedom (i.e. (number of observations) - (number of parameters)). This is essentially taking the advice for GLMs (see e.g. McCullagh and Nelder, or probably any textbook on GLMs) and applying it to GLMMs.
On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com> wrote:
If, for a given built glmm "mod", I don't want to use an available tool to
check out (over or under) dispersion, with which variance should I compare
the total variance explained by mod?
In advance, thanks for your replies.
Kind regards,
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Note also the comments of McCullagh and Nelder (2edn, 1999, p.126), speaking somewhat disparagingly about the use of beta-binomial models as a way to model dispersion, as defined for glm(): ?Though this is an attractive option from a theoretical standpoint, in practice it seems unwise to rely on a specific form of over-dispersion, particularly where the assumed form has been chosen for mathematical convenience rather than scientific plausibility.? At least for data with which I have been working, I beg to disagree! The great virtue of glmmTMB::glmmTMB() is that it allows modeling of its version of the dispersion parameter (not dispersion as for glm()) as a function of explanatory variables. My experience of using glmmTMB() with a several insect mortality datasets from a much larger collection was that, consistently, the over-dispersion factor was large at midrange mortalities, reducing to close to 1 (i.e., binomial-like) at high mortalities. There are only 2 datasets that I have access to that I am currently free to make public, unfortunately. [I suspect that one should somehow be modeling the relevant parameter as a function of estimated mortality rather than indirectly as a function of explanatory variables. I?ve wondered whether there is some different way to handle the parameterization that would build this in.] I?ve not tried modeling the GLM style over-dispersion as a function of explanatory variables ? there may be some of the software that is about that allows this. My guess is that, as reported in Morgan and Ridout (2008) for very different data, the beta-binomial would be favoured over a quasi-binomial, with a mixture of the two doing better still. [A new mixture model for capture heterogeneity. Applied Statistics C. https://doi.org/10.1111/j.1467-9876.2008.00620.x] See https://maths-people.anu.edu.au/%7Ejohnm/r-book/4edn/ch7-BetaBinomial.pdf<https://maths-people.anu.edu.au/~johnm/r-book/4edn/ch7-BetaBinomial.pdf> for details of what I have done with a dataset that I have permission to expose to public view. The beta-binomial implies that the variance can never be reduced below a lower bound that depends on the dispersion parameter, which I find convenient to take for this purpose as the intra-class correlation. That is a big difference, if one wants to use results for designing further trials, from the story that comes from a quasi-binomial model. I think it more likely that the benefits of increasing sample size attenuate as the sample size increases, with no variance lower bound. For the recent data on which I had been working, the relevant glmmTMB abilities became available too recently (~Jan, 2018) to be applied across all the datasets to which I had access. With what I believe I now know, I?d have had the confidence to pursue the use of other packages that can be used, with a bit more effort, to achieve a similar result. Hindsight is a great thing. Were I in mid-career, I?d likely be pursuing these ideas with some vigour. I?d be happy to co-operate with anyone who wants to take them further, and might be able to negotiate access to a wider range of datasets than I can currently expose to public view. It surprises me that this seems an area that has been very little explored, certainly as it relates to plant quarantine research ? what has been done to date, including work that I did in the 1980s and 1990s, now strikes me as naive. John Maindonald email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>
On 10/08/2018, at 14:10, Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>> wrote:
The standard advice is to compare either the residual deviance or the sum of squares of the Pearson residuals to the residual degrees of freedom (i.e. (number of observations) - (number of parameters)). This is essentially taking the advice for GLMs (see e.g. McCullagh and Nelder, or probably any textbook on GLMs) and applying it to GLMMs.
On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com<mailto:altessedac2 at gmail.com>> wrote:
If, for a given built glmm "mod", I don't want to use an available tool to check out (over or under) dispersion, with which variance should I compare the total variance explained by mod? In advance, thanks for your replies. Kind regards, _______________________________________________ R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear all, many thanks for your very useful advices. Best wishes, Amal 2018-08-10 5:59 GMT+02:00 John Maindonald <john.maindonald at anu.edu.au>:
Note also the comments of McCullagh and Nelder (2edn, 1999, p.126), speaking somewhat disparagingly about the use of beta-binomial models as a way to model dispersion, as defined for glm(): ?Though this is an attractive option from a theoretical standpoint, in practice it seems unwise to rely on a specific form of over-dispersion, particularly where the assumed form has been chosen for mathematical convenience rather than scientific plausibility.? At least for data with which I have been working, I beg to disagree! The great virtue of glmmTMB::glmmTMB() is that it allows modeling of its version of the dispersion parameter (not dispersion as for glm()) as a function of explanatory variables. My experience of using glmmTMB() with a several insect mortality datasets from a much larger collection was that, consistently, the over-dispersion factor was large at midrange mortalities, reducing to close to 1 (i.e., binomial-like) at high mortalities. There are only 2 datasets that I have access to that I am currently free to make public, unfortunately. [I suspect that one should somehow be modeling the relevant parameter as a function of estimated mortality rather than indirectly as a function of explanatory variables. I?ve wondered whether there is some different way to handle the parameterization that would build this in.] I?ve not tried modeling the GLM style over-dispersion as a function of explanatory variables ? there may be some of the software that is about that allows this. My guess is that, as reported in Morgan and Ridout (2008) for very different data, the beta-binomial would be favoured over a quasi-binomial, with a mixture of the two doing better still. [A new mixture model for capture heterogeneity. Applied Statistics C. https://doi.org/10.1111/j.1467-9876.2008.00620.x] See https://maths-people.anu.edu.au/%7Ejohnm/r-book/4edn/ ch7-BetaBinomial.pdf <https://maths-people.anu.edu.au/~johnm/r-book/4edn/ch7-BetaBinomial.pdf> for details of what I have done with a dataset that I have permission to expose to public view. The beta-binomial implies that the variance can never be reduced below a lower bound that depends on the dispersion parameter, which I find convenient to take for this purpose as the intra-class correlation. That is a big difference, if one wants to use results for designing further trials, from the story that comes from a quasi-binomial model. I think it more likely that the benefits of increasing sample size attenuate as the sample size increases, with no variance lower bound. For the recent data on which I had been working, the relevant glmmTMB abilities became available too recently (~Jan, 2018) to be applied across all the datasets to which I had access. With what I believe I now know, I?d have had the confidence to pursue the use of other packages that can be used, with a bit more effort, to achieve a similar result. Hindsight is a great thing. Were I in mid-career, I?d likely be pursuing these ideas with some vigour. I?d be happy to co-operate with anyone who wants to take them further, and might be able to negotiate access to a wider range of datasets than I can currently expose to public view. It surprises me that this seems an area that has been very little explored, certainly as it relates to plant quarantine research ? what has been done to date, including work that I did in the 1980s and 1990s, now strikes me as naive. John Maindonald email: john.maindonald at anu.edu.au <john.maindonald at anu.edu.au> On 10/08/2018, at 14:10, Ben Bolker <bbolker at gmail.com> wrote: The standard advice is to compare either the residual deviance or the sum of squares of the Pearson residuals to the residual degrees of freedom (i.e. (number of observations) - (number of parameters)). This is essentially taking the advice for GLMs (see e.g. McCullagh and Nelder, or probably any textbook on GLMs) and applying it to GLMMs. On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com> wrote: If, for a given built glmm "mod", I don't want to use an available tool to check out (over or under) dispersion, with which variance should I compare the total variance explained by mod? In advance, thanks for your replies. Kind regards, [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models