zero-inflation and multimodal count distribution

Tue, May 2, 2017 9:31 AM

On 17-05-02 12:20 PM, simone santoro wrote:

Hi all,

I am trying to test a hypothesis regarding the different contribution of
sons and daughters to parents? fitness. I have a number of (bird) nests of
which I have measured a feature of parents related to their quality
(continuous variable) that I hypothesize affects the future lifetime
fecundity of their sons and daughters.

Specifically, my hypothesis is that at high values of parents? quality sons
will be more fecund than sisters through their entire life and vice versa,
at low values of parents? quality, daughters will be more than brothers.


Note that sons and daughters of a nest, of which I have recorded their
lifetime fecundity, are born all the same year. Thus, year of birth (of
sons and daughters) is a random intercept I want to control for as it is
the nest identity. The data set may be arranged in two ways, one that
considers a row for each nest and another that considers a row for each
offspring (son or daughter).


In case 1 (row = nest), I have these variables: FN, family name; YEAR,
birth year of sons and daughter; nDescBySons, lifetime total number of
progeny generated by sons (pooled);  nDescByDaughs, lifetime total number
of progeny generated by daughters (pooled); nSons, number of sons; nDaughs,
number of daughters; parQuality, parents? quality.

In case 2 (row = son or daughter), I have these variables: FN, family name;
YEAR, birth year of sons and daughter; nDesc, lifetime total number of
progeny generated by the individual; sex, son or daughter; nestSize, total
number of sons and daughters at nest; parQuality, parents? quality.


In a way, I think that the second arrangement of data is easier to be
analyzed for testing my hypothesis (comment/suggestion on this?). In this
way I have direct information on the individual-level lifetime fecundity of
sons and daughters and have not necessarily to take care of how many sons
and daughters were at the nest.
However, I have lot of zeros (many sons and daughters disappear ? die or
emigrate - and have no recorded descendants at all) and data have a kind of
bimodal distribution after the zero mode (see below image):

https://drive.google.com/open?id=0BwsTfIcebsrOZnljSW9uQXF2UU0


Thus, I would use a zero-inflated GLMM as, for instance, by using glmmTMB
package in R. Something like this:

glmmTMB(nDesc ~ parQuality*sex+(1|NF)+(1|YEAR),?, zi~1)

But, what about that ?ugly? multimodal distribution? I thought I may try
different distributions (e.g. poisson, compois, any other?) and compare the
model fit by looking at the AIC.

Any advice on this would be extremely appreciated.


Simone

My main thought is that your plots show the *marginal* distribution
of the data.  Differences among families/years or odd shapes of the
parental quality distribution could drive this pattern without any need
to assume the *conditional* distribution is multimodal.  Fit a sensible
model (like the one you suggest) and then check diagnostics in various
ways (if you have enough data, you could consider interactions between
sex and parental quality and the random effects -- e.g. does parental
quality matter more in some birth years than others?)

zero-inflation and multimodal count distribution

Thread (2 messages)