zero-inflation and multimodal count distribution
[Please keep r-sig-mixed-models at r-project.org in the Cc: ...] On Wed, May 3, 2017 at 1:43 PM, simone santoro
<simonesantoro77 at gmail.com> wrote:
Thank you for observing that, it makes much sense!
I have being exploring better the data and have realized that the response
variable varies greatly among years in a way that might explain why the
multimodal marginal distribution. I have also tried different combinations
of random intercepts/slopes and have found that the best fit is achieved by
defining nest identity and year as (independent) random intercepts
{(1|NF)+(1|YEAR)}. Now, I would refine my modelling approach and have tried
to model these data in a different way.
As told in the previous mail, actually, I am interested in testing whether
the ratio of sons relative to daughters contribution in terms of number of
descendants (nDesc) depends on the non-additive effect of parental quality
(parQuality). Then, I have tried a zero-inflated binomial and betabinomial
family. In these cases I have considered as response variable the number of
descendants generated by each individual (son or daughter), i.e. nDesc,
relative to the sum of descendants (sumDesc) generated by the whole nest
(sons + daughters). My response variable is therefore: nDesc/sumDesc
I would think that a betabinomial distribution might be better because I
know that the response variable has lot of zeros and lot of 1. By the way:
do exist zero-one-inflated beta (mixed) models as that discussed here?
Thus I try the betabinomial first:
tmbBetaBinomial<- glmmTMB(nDesc/sumDesc ~ parQuality*sex+(1|NF)+(1|YEAR),?,
zi~1,family= list(family="betabinomial",link="logit"), weights= sumDesc)
and I get this warning message:
Warning message:
In nlminb(start = par, objective = fn, gradient = gr) :
NA/NaN function evaluation
This is probably a harmless error message. It just means that the optimizer wandered into illegal territory somewhere along the way.
Then I try the binomial: tmbBinomial<- glmmTMB(nDesc/sumDesc ~ parQuality*sex+(1|NF)+(1|YEAR),?, zi~1,family= binomial, weights= sumDesc) I get no warning messages. This is the summary of the betabinomial glmm:
summary(tmbBetaBinomial)
Family: betabinomial ( logit )
Formula: nDesc/sumDesc ~ parQuality * sex + (1 | NF) + (1 | YEAR)
Zero inflation: ~1
Data: ind
Weights: sumDesc
AIC BIC logLik deviance df.resid
5939.0 5983.4 -2961.5 5923.0 1886
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
NF (Intercept) 4.130e-09 6.426e-05
YEAR (Intercept) 1.328e+00 1.152e+00
Number of obs: 1894, groups: NF, 719; YEAR, 24
Overdispersion parameter for betabinomial family (): 0.38
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.099863 0.355641 0.281 0.779
parQuality 0.001557 0.004606 0.338 0.735
sexmal 0.002171 0.350213 0.006 0.995
parQuality:sexmal 0.000784 0.006104 0.128 0.898
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -27.8 64568.8 0 1
There is a strange z value for the zero-inflation model, what could it mean?
It means that the model tried to reduce the zero-inflation component to zero. It fits the zero-inflation probability on a logit scale, so it can't make it *exactly* zero, but it can make it very very close. Note that the NF variance is also effectively zero (std dev 10^5 times smaller than the YEAR variance and about 10 times smaller than the smallest-magnitude fixed effect). The AIC/log-likelihood for this model are much better than for the binomial GLMM, so (alas) I would advise you to take this model instead. In any case I think you should try plotting (or otherwise inspecting) the predictions from both models, to understand what they're saying.
This is the summary of the binomial glmm:
Family: binomial ( logit )
Formula: nDesc/sumDesc ~ parQuality * sex + (1 | NF) + (1 | YEAR)
Zero inflation: ~1
Data: ind
Weights: sumDesc
AIC BIC logLik deviance df.resid
6841.9 6880.8 -3414.0 6827.9 1887
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
NF (Intercept) 3.646e+01 6.0381644
YEAR (Intercept) 1.090e-08 0.0001044
Number of obs: 1894, groups: NF, 719; YEAR, 24
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.386051 0.877055 8.421 < 2e-16 ***
parQuality -0.007452 0.013641 -0.546 0.58483
sexmal -0.649383 0.235775 -2.754 0.00588 **
parQuality:sexmal 0.009153 0.004007 2.284 0.02235 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.03268 0.04625 -0.706 0.48
This would make biological sense to me because I would expect that sons
("sexmal") are advantaged relative to their sisters when the parental
quality is high. However, may I trust this result? is there something that
you see I am missing?
Best,
Simone
2017-05-02 18:31 GMT+02:00 Ben Bolker <bbolker at gmail.com>:
On 17-05-02 12:20 PM, simone santoro wrote:
Hi all, I am trying to test a hypothesis regarding the different contribution of sons and daughters to parents? fitness. I have a number of (bird) nests of which I have measured a feature of parents related to their quality (continuous variable) that I hypothesize affects the future lifetime fecundity of their sons and daughters. Specifically, my hypothesis is that at high values of parents? quality sons will be more fecund than sisters through their entire life and vice versa, at low values of parents? quality, daughters will be more than brothers. Note that sons and daughters of a nest, of which I have recorded their lifetime fecundity, are born all the same year. Thus, year of birth (of sons and daughters) is a random intercept I want to control for as it is the nest identity. The data set may be arranged in two ways, one that considers a row for each nest and another that considers a row for each offspring (son or daughter). In case 1 (row = nest), I have these variables: FN, family name; YEAR, birth year of sons and daughter; nDescBySons, lifetime total number of progeny generated by sons (pooled); nDescByDaughs, lifetime total number of progeny generated by daughters (pooled); nSons, number of sons; nDaughs, number of daughters; parQuality, parents? quality. In case 2 (row = son or daughter), I have these variables: FN, family name; YEAR, birth year of sons and daughter; nDesc, lifetime total number of progeny generated by the individual; sex, son or daughter; nestSize, total number of sons and daughters at nest; parQuality, parents? quality. In a way, I think that the second arrangement of data is easier to be analyzed for testing my hypothesis (comment/suggestion on this?). In this way I have direct information on the individual-level lifetime fecundity of sons and daughters and have not necessarily to take care of how many sons and daughters were at the nest. However, I have lot of zeros (many sons and daughters disappear ? die or emigrate - and have no recorded descendants at all) and data have a kind of bimodal distribution after the zero mode (see below image): https://drive.google.com/open?id=0BwsTfIcebsrOZnljSW9uQXF2UU0 Thus, I would use a zero-inflated GLMM as, for instance, by using glmmTMB package in R. Something like this: glmmTMB(nDesc ~ parQuality*sex+(1|NF)+(1|YEAR),?, zi~1) But, what about that ?ugly? multimodal distribution? I thought I may try different distributions (e.g. poisson, compois, any other?) and compare the model fit by looking at the AIC. Any advice on this would be extremely appreciated. Simone
My main thought is that your plots show the *marginal* distribution of the data. Differences among families/years or odd shapes of the parental quality distribution could drive this pattern without any need to assume the *conditional* distribution is multimodal. Fit a sensible model (like the one you suggest) and then check diagnostics in various ways (if you have enough data, you could consider interactions between sex and parental quality and the random effects -- e.g. does parental quality matter more in some birth years than others?)
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models