Skip to content

glm model with all zeros for one of the factor level

4 messages · Paul Johnson, Ben Bolker, Juan Pablo Edwards Molina

#
Dear List members:

I performed two independent experiments (CRD) to test if a whitefly
has preference to infect with a virus: potatos, tomatos or peppers
(target hosts, TH), wether if the virus was obtained from potato or
tomato (source hosts, SH). So I released 100 white flyes (previously
infected with the virus from one or other SH) inside cages containing
10 plants of each TH (30 total). This is how the data looks like:

exp  SH    TH    cage    tot  posit

1     tom    tom     1       10     4
1     tom    bat      1       10     3
1     tom    pep     1       10     0
1     bat    tom      2       10     1
1     bat    bat       2       10     2
1     bat    pep      2       10     0

2     tom    tom     3       10     6
2     tom    bat      3       10     4
2     tom    pep     3       10     0
2     bat    tom      4       10     4
2     bat    bat       4       10     0
2     bat    pep      4       10     0

The issue I found here  is that pepper was not infected at all,
however it was infected in another experiment without chance of TH
choice: i.e. I released infectious whiteflies inside cages containing
the same pepper genotyope and they present the typical virus disease
symptoms.

So, how should I consider modeling this data?
Zero-inflated negative binomial using the total plants as offset? Hurdle-model?
Should I remove the pepper level for the model?

Any help would be really helpful.

Juan Edwards
#
(This question is about GLMs rather than mixed models in R.)

I recommend reading up on separation in logistic regression, where the proportion in any of the categories formed by the fixed effects is exactly 1 or 0, so that a maximum likelihood estimate of the log odds doesn't exists. The logistf package is the simplest way of dealing with this in R.

Good luck,
Paul


Sent from BlueMail<http://www.bluemail.me/r?b=11327>
On 5 Dec 2017, at 16:00, Juan Pablo Edwards Molina <edwardsmolina at gmail.com<mailto:edwardsmolina at gmail.com>> wrote:
Dear List members:

I performed two independent experiments (CRD) to test if a whitefly
has preference to infect with a virus: potatos, tomatos or peppers
(target hosts, TH), wether if the virus was obtained from potato or
tomato (source hosts, SH). So I released 100 white flyes (previously
infected with the virus from one or other SH) inside cages containing
10 plants of each TH (30 total). This is how the data looks like:

exp  SH    TH    cage    tot  posit

1     tom    tom     1       10     4
1     tom    bat      1       10     3
1     tom    pep     1       10     0
1     bat    tom      2       10     1
1     bat    bat       2       10     2
1     bat    pep      2       10     0

2     tom    tom     3       10     6
2     tom    bat      3       10     4
2     tom    pep     3       10     0
2     bat    tom      4       10     4
2     bat    bat       4       10     0
2     bat    pep      4       10     0

The issue I found here  is that pepper was not infected at all,
however it was infected in another experiment without chance of TH
choice: i.e. I released infectious whiteflies inside cages containing
the same pepper genotyope and they present the typical virus disease
symptoms.

So, how should I consider modeling this data?
Zero-inflated negative binomial using the total plants as offset? Hurdle-model?
Should I remove the pepper level for the model?

Any help would be really helpful.

Juan Edwards

________________________________

R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Agreed. I added a section to the glmm FAQ giving guidance on how to to
this in the GLMM case:
http://bbolker.github.io/mixedmodels-misc/ecostats_chap.html#digression-complete-separation
On 17-12-06 09:36 AM, Paul Johnson wrote:
#
Thanks Ben and Paul... By the way, that was what actually happened
after fitting the glm:
...huge Wald confidence intervals...(from glm FAQ).
I will take a look on that section of glm FAQ.
best,
Juan Edwards
Juan
On Wed, Dec 6, 2017 at 11:53 AM, Ben Bolker <bbolker at gmail.com> wrote: