I have a question about how glmmtmb handles proportion data for the purposes of a binomial glmm. I combined my success and failure count data into a matrix using cbind(), and used that as my response in my binomial glmm using glmmtmb. However, despite there being a few instances of zero counts in both columns and therefore an undefined proportion, the model doesn't seem to drop these rows from my data set. I don't get any errors or warnings when running the model, but I worry my results might be biased because of this. My question is: Is glmmtmb doing something like adding a tiny amount to each value of my response in order to avoid dealing with undefined proportion data? Thank you for your help, Robert
Question about proportion data in binomial glmm
5 messages · Thierry Onkelinx, Mollie Brooks, Ben Bolker +1 more
Dear Robert, IMHO you should remove the cbind(0, 0) before fitting the model. There is no reason to keep them in the dataset. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op vr 24 mrt 2023 om 02:39 schreef rtfiner <rtfiner at gmail.com>:
I have a question about how glmmtmb handles proportion data for the
purposes of a binomial glmm.
I combined my success and failure count data into a matrix using cbind(),
and used that as my response in my binomial glmm using glmmtmb.
However, despite there being a few instances of zero counts in both columns
and therefore an undefined proportion, the model doesn't seem to drop these
rows from my data set.
I don't get any errors or warnings when running the model, but I worry my
results might be biased because of this.
My question is: Is glmmtmb doing something like adding a tiny amount to
each value of my response in order to avoid dealing with undefined
proportion data?
Thank you for your help,
Robert
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
They have zero contribution to the log-likelihood, so they shouldn?t affect the model.
dbinom(0, 0, 0.1, log=TRUE)
[1] 0 I can?t say if they would affect any model evaluation functionality, but they shouldn't. Best, Mollie
On 24 Mar 2023, at 09.12, Thierry Onkelinx via R-sig-mixed-models <r-sig-mixed-models at r-project.org> wrote: Dear Robert, IMHO you should remove the cbind(0, 0) before fitting the model. There is no reason to keep them in the dataset. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op vr 24 mrt 2023 om 02:39 schreef rtfiner <rtfiner at gmail.com>:
I have a question about how glmmtmb handles proportion data for the
purposes of a binomial glmm.
I combined my success and failure count data into a matrix using cbind(),
and used that as my response in my binomial glmm using glmmtmb.
However, despite there being a few instances of zero counts in both columns
and therefore an undefined proportion, the model doesn't seem to drop these
rows from my data set.
I don't get any errors or warnings when running the model, but I worry my
results might be biased because of this.
My question is: Is glmmtmb doing something like adding a tiny amount to
each value of my response in order to avoid dealing with undefined
proportion data?
Thank you for your help,
Robert
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
3 days later
?? The only further issue here is that the number of observations for the model will still be computed as including these null values. This should only matter if you're doing something like computing finite-size-corrected AICs (and to paraphrase _Numerical Recipes_, if this level of difference matters to you then you're on shaky ground anyway ...) ? The source code for the dbinom implementation in TMB: https://kaskr.github.io/adcomp/distributions__R_8hpp_source.html ? illustrates that values with N=0, k = 0 will have no effect on the log-likelihood (while TMB mirrors R's behaviour most of the time, it's not 100% safe to assume that edge cases will work exactly the same in R and TMB)
On 2023-03-24 6:36 a.m., Mollie Brooks wrote:
They have zero contribution to the log-likelihood, so they shouldn?t affect the model.
dbinom(0, 0, 0.1, log=TRUE)
[1] 0 I can?t say if they would affect any model evaluation functionality, but they shouldn't. Best, Mollie
On 24 Mar 2023, at 09.12, Thierry Onkelinx via R-sig-mixed-models <r-sig-mixed-models at r-project.org> wrote: Dear Robert, IMHO you should remove the cbind(0, 0) before fitting the model. There is no reason to keep them in the dataset. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op vr 24 mrt 2023 om 02:39 schreef rtfiner <rtfiner at gmail.com>:
I have a question about how glmmtmb handles proportion data for the
purposes of a binomial glmm.
I combined my success and failure count data into a matrix using cbind(),
and used that as my response in my binomial glmm using glmmtmb.
However, despite there being a few instances of zero counts in both columns
and therefore an undefined proportion, the model doesn't seem to drop these
rows from my data set.
I don't get any errors or warnings when running the model, but I worry my
results might be biased because of this.
My question is: Is glmmtmb doing something like adding a tiny amount to
each value of my response in order to avoid dealing with undefined
proportion data?
Thank you for your help,
Robert
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
2 days later
Thank you all for your input. I tried fitting the model with NaNs removed, and output and evaluation were very similar, so perhaps I am okay? -Robert
On Mon, Mar 27, 2023 at 9:18?AM Ben Bolker <bbolker at gmail.com> wrote:
The only further issue here is that the number of observations for the model will still be computed as including these null values. This should only matter if you're doing something like computing finite-size-corrected AICs (and to paraphrase _Numerical Recipes_, if this level of difference matters to you then you're on shaky ground anyway ...) The source code for the dbinom implementation in TMB: https://kaskr.github.io/adcomp/distributions__R_8hpp_source.html illustrates that values with N=0, k = 0 will have no effect on the log-likelihood (while TMB mirrors R's behaviour most of the time, it's not 100% safe to assume that edge cases will work exactly the same in R and TMB) On 2023-03-24 6:36 a.m., Mollie Brooks wrote:
They have zero contribution to the log-likelihood, so they shouldn?t
affect the model.
dbinom(0, 0, 0.1, log=TRUE)
[1] 0 I can?t say if they would affect any model evaluation functionality, but
they shouldn't.
Best, Mollie
On 24 Mar 2023, at 09.12, Thierry Onkelinx via R-sig-mixed-models <
r-sig-mixed-models at r-project.org> wrote:
Dear Robert, IMHO you should remove the cbind(0, 0) before fitting the model. There
is
no reason to keep them in the dataset. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND
FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to
say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be> Op vr 24 mrt 2023 om 02:39 schreef rtfiner <rtfiner at gmail.com>:
I have a question about how glmmtmb handles proportion data for the purposes of a binomial glmm. I combined my success and failure count data into a matrix using
cbind(),
and used that as my response in my binomial glmm using glmmtmb. However, despite there being a few instances of zero counts in both
columns
and therefore an undefined proportion, the model doesn't seem to drop
these
rows from my data set. I don't get any errors or warnings when running the model, but I worry
my
results might be biased because of this.
My question is: Is glmmtmb doing something like adding a tiny amount to
each value of my response in order to avoid dealing with undefined
proportion data?
Thank you for your help,
Robert
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
*-Robert Finer* [[alternative HTML version deleted]]