zero-inflation and multimodal count distribution
On 17-05-02 12:20 PM, simone santoro wrote:
Hi all, I am trying to test a hypothesis regarding the different contribution of sons and daughters to parents? fitness. I have a number of (bird) nests of which I have measured a feature of parents related to their quality (continuous variable) that I hypothesize affects the future lifetime fecundity of their sons and daughters. Specifically, my hypothesis is that at high values of parents? quality sons will be more fecund than sisters through their entire life and vice versa, at low values of parents? quality, daughters will be more than brothers. Note that sons and daughters of a nest, of which I have recorded their lifetime fecundity, are born all the same year. Thus, year of birth (of sons and daughters) is a random intercept I want to control for as it is the nest identity. The data set may be arranged in two ways, one that considers a row for each nest and another that considers a row for each offspring (son or daughter). In case 1 (row = nest), I have these variables: FN, family name; YEAR, birth year of sons and daughter; nDescBySons, lifetime total number of progeny generated by sons (pooled); nDescByDaughs, lifetime total number of progeny generated by daughters (pooled); nSons, number of sons; nDaughs, number of daughters; parQuality, parents? quality. In case 2 (row = son or daughter), I have these variables: FN, family name; YEAR, birth year of sons and daughter; nDesc, lifetime total number of progeny generated by the individual; sex, son or daughter; nestSize, total number of sons and daughters at nest; parQuality, parents? quality. In a way, I think that the second arrangement of data is easier to be analyzed for testing my hypothesis (comment/suggestion on this?). In this way I have direct information on the individual-level lifetime fecundity of sons and daughters and have not necessarily to take care of how many sons and daughters were at the nest. However, I have lot of zeros (many sons and daughters disappear ? die or emigrate - and have no recorded descendants at all) and data have a kind of bimodal distribution after the zero mode (see below image): https://drive.google.com/open?id=0BwsTfIcebsrOZnljSW9uQXF2UU0 Thus, I would use a zero-inflated GLMM as, for instance, by using glmmTMB package in R. Something like this: glmmTMB(nDesc ~ parQuality*sex+(1|NF)+(1|YEAR),?, zi~1) But, what about that ?ugly? multimodal distribution? I thought I may try different distributions (e.g. poisson, compois, any other?) and compare the model fit by looking at the AIC. Any advice on this would be extremely appreciated. Simone
My main thought is that your plots show the *marginal* distribution of the data. Differences among families/years or odd shapes of the parental quality distribution could drive this pattern without any need to assume the *conditional* distribution is multimodal. Fit a sensible model (like the one you suggest) and then check diagnostics in various ways (if you have enough data, you could consider interactions between sex and parental quality and the random effects -- e.g. does parental quality matter more in some birth years than others?)