glmms:Checking out over_under_dispersion - R-SIG-mixed-models

Thu, Aug 9, 2018 6:23 PM #

If, for a given built glmm "mod", I don't want to use an available tool to
check out (over or under) dispersion, with which variance should I compare
the total variance explained by mod?
In advance, thanks for your replies.
Kind regards,

Ben Bolker

Thu, Aug 9, 2018 7:10 PM #

The standard advice is to compare either the residual deviance or the
sum of squares of the Pearson residuals to the residual degrees of
freedom (i.e. (number of observations) - (number of parameters)). This
is essentially taking the advice for GLMs (see e.g. McCullagh and
Nelder, or probably any textbook on GLMs) and applying it to GLMMs.

On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com> wrote:

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

John Maindonald

Thu, Aug 9, 2018 8:59 PM #

Note also the comments of McCullagh and Nelder (2edn, 1999, p.126),
speaking somewhat disparagingly about the use of beta-binomial
models as a way to model dispersion, as defined for glm():
?Though this is an attractive option  from a theoretical standpoint, in
practice it seems unwise to rely on a specific form of over-dispersion,
particularly where the assumed form has been chosen for mathematical
convenience rather than scientific plausibility.?
At least for data with which I have been working, I beg to disagree!

The great virtue of glmmTMB::glmmTMB() is that it allows modeling of
its version of the dispersion parameter (not dispersion as for glm()) as a
function of explanatory variables.  My experience of using glmmTMB()
with a several insect mortality datasets from a much larger collection
was that, consistently, the over-dispersion factor was large at midrange
mortalities, reducing to close to 1 (i.e., binomial-like) at high mortalities.
There are only 2 datasets that I have access to that I am currently free
to make public, unfortunately.

[I suspect that one should somehow be modeling the relevant parameter
as a function of estimated mortality rather than indirectly as a function
of explanatory variables.  I?ve wondered whether there is some different
way to handle the parameterization that would build this in.]

I?ve not tried modeling the GLM style over-dispersion as a function of
explanatory variables ? there may be some of the software that is
about that allows this.  My guess is that, as reported in Morgan and
Ridout (2008) for very different data, the beta-binomial would be
favoured over a quasi-binomial, with a mixture of the two doing better still.
[A new mixture model for capture heterogeneity. Applied Statistics C.
https://doi.org/10.1111/j.1467-9876.2008.00620.x]

See https://maths-people.anu.edu.au/%7Ejohnm/r-book/4edn/ch7-BetaBinomial.pdf<https://maths-people.anu.edu.au/~johnm/r-book/4edn/ch7-BetaBinomial.pdf>
for details of what I have done with a dataset that I have permission to
expose to public view.

The beta-binomial implies that the variance can never be reduced below
a lower bound that depends on the dispersion parameter, which I find
convenient to take for this purpose as the intra-class correlation.  That is
a big difference, if one wants to use results for designing further trials,
from the story that comes from a quasi-binomial model. I think it more
likely that the benefits of increasing sample size attenuate as the sample
size increases, with no variance lower bound.  For the recent data on
which I had been working, the relevant glmmTMB abilities became
available too recently (~Jan, 2018) to be applied across all the datasets
to which I had access.  With what I believe I now know, I?d have had
the confidence to pursue the use of other packages that can be used,
with a bit more effort, to achieve a similar result.  Hindsight is a great
thing.

Were I in mid-career, I?d likely be pursuing these ideas with some vigour.
I?d be happy to co-operate with anyone who wants to take them further,
and might be able to negotiate access to a wider range of datasets than
I can currently expose to public view.  It surprises me that this seems an
area that has been very little explored, certainly as it relates to plant
quarantine research ? what has been done to date, including work that
I did in the 1980s and 1990s, now strikes me as naive.


John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 10/08/2018, at 14:10, Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>> wrote:

The standard advice is to compare either the residual deviance or the
sum of squares of the Pearson residuals to the residual degrees of
freedom (i.e. (number of observations) - (number of parameters)). This
is essentially taking the advice for GLMs (see e.g. McCullagh and
Nelder, or probably any textbook on GLMs) and applying it to GLMMs.

On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com<mailto:altessedac2 at gmail.com>> wrote:

If, for a given built glmm "mod", I don't want to use an available tool to
check out (over or under) dispersion, with which variance should I compare
the total variance explained by mod?
In advance, thanks for your replies.
Kind regards,


_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

C. AMAL D. GLELE

Fri, Aug 10, 2018 6:12 AM #

Dear all,
many thanks for your very useful advices.
Best wishes,
Amal

2018-08-10 5:59 GMT+02:00 John Maindonald <john.maindonald at anu.edu.au>:

Note also the comments of McCullagh and Nelder (2edn, 1999, p.126),
speaking somewhat disparagingly about the use of beta-binomial
models as a way to model dispersion, as defined for glm():
?Though this is an attractive option  from a theoretical standpoint, in
practice it seems unwise to rely on a specific form of over-dispersion,
particularly where the assumed form has been chosen for mathematical
convenience rather than scientific plausibility.?
At least for data with which I have been working, I beg to disagree!

The great virtue of glmmTMB::glmmTMB() is that it allows modeling of
its version of the dispersion parameter (not dispersion as for glm()) as a
function of explanatory variables.  My experience of using glmmTMB()
with a several insect mortality datasets from a much larger collection
was that, consistently, the over-dispersion factor was large at midrange
mortalities, reducing to close to 1 (i.e., binomial-like) at high
mortalities.
There are only 2 datasets that I have access to that I am currently free
to make public, unfortunately.

[I suspect that one should somehow be modeling the relevant parameter
as a function of estimated mortality rather than indirectly as a function
of explanatory variables.  I?ve wondered whether there is some different
way to handle the parameterization that would build this in.]

I?ve not tried modeling the GLM style over-dispersion as a function of
explanatory variables ? there may be some of the software that is
about that allows this.  My guess is that, as reported in Morgan and
Ridout (2008) for very different data, the beta-binomial would be
favoured over a quasi-binomial, with a mixture of the two doing better
still.
[A new mixture model for capture heterogeneity. Applied Statistics C.
https://doi.org/10.1111/j.1467-9876.2008.00620.x]

See https://maths-people.anu.edu.au/%7Ejohnm/r-book/4edn/
ch7-BetaBinomial.pdf
<https://maths-people.anu.edu.au/~johnm/r-book/4edn/ch7-BetaBinomial.pdf>
for details of what I have done with a dataset that I have permission to
expose to public view.

The beta-binomial implies that the variance can never be reduced below
a lower bound that depends on the dispersion parameter, which I find
convenient to take for this purpose as the intra-class correlation.  That
is
a big difference, if one wants to use results for designing further trials,
from the story that comes from a quasi-binomial model. I think it more
likely that the benefits of increasing sample size attenuate as the sample
size increases, with no variance lower bound.  For the recent data on
which I had been working, the relevant glmmTMB abilities became
available too recently (~Jan, 2018) to be applied across all the datasets
to which I had access.  With what I believe I now know, I?d have had
the confidence to pursue the use of other packages that can be used,
with a bit more effort, to achieve a similar result.  Hindsight is a great
thing.

Were I in mid-career, I?d likely be pursuing these ideas with some vigour.
I?d be happy to co-operate with anyone who wants to take them further,
and might be able to negotiate access to a wider range of datasets than
I can currently expose to public view.  It surprises me that this seems an
area that has been very little explored, certainly as it relates to plant
quarantine research ? what has been done to date, including work that
I did in the 1980s and 1990s, now strikes me as naive.

John Maindonald             email: john.maindonald at anu.edu.au
<john.maindonald at anu.edu.au>

On 10/08/2018, at 14:10, Ben Bolker <bbolker at gmail.com> wrote:

The standard advice is to compare either the residual deviance or the
sum of squares of the Pearson residuals to the residual degrees of
freedom (i.e. (number of observations) - (number of parameters)). This
is essentially taking the advice for GLMs (see e.g. McCullagh and
Nelder, or probably any textbook on GLMs) and applying it to GLMMs.
On Thu, Aug 9, 2018 at 9:24 PM C. AMAL D. GLELE <altessedac2 at gmail.com>
wrote:


If, for a given built glmm "mod", I don't want to use an available tool to
check out (over or under) dispersion, with which variance should I compare
the total variance explained by mod?
In advance, thanks for your replies.
Kind regards,

       [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models