Skip to content

mixed zero-inflated poisson regression

7 messages · ONKELINX, Thierry, Ivar Herfindal, Denise Rocha +1 more

#
Dear all

It seems like I have reached the limit of my statistical skills, and now 
turn to the mixed-list for some assistance.

I have a dataset with a response variable that appears to have a 
distribution that is zero-inflated poisson (ZIP) (or close to). The 
response variable is number of fledglings from nests with known mothers. 
The number of fledglings has a lot of zeros that comes from failed 
breeding attempts. The number of fledgings can therefore be considered 
as a two-step process:
1. do the female breed, and
2. if the female breed, what was the number of fledglings.

I want to relate the number of fledglings to some predictor variables, 
and I am particularly interested in the combined effect of these 
variables on the probability to breed and number of fledglings given 
breeding. I have several observations per female (multiple years) and 
each nest location is also represented by several females (in different 
years). I therefore need/want to account for random factors. I guess I 
could run two separate models (one binomial and one poisson), but I need 
to make predictions on how my explanatory variables affect the number of 
fledglings irrespective of the mechanism behind.

I have searched R-sites a lot, and it found some packages that can model 
ZIP.
1. pscl (function zeroinfl)
Pros: have the option to relate both the poisson and binomial process to 
the predictor variable, so I can evaluate how probability of breeding 
and number of fledglings depend on my predictor variables
Cons: Can not include random factors.

2. glmmADMB (function glmmadmb), MCMCglmm, and gamlss
Pros: allows random factors
Cons: 1. all packages seems to assume that all zeros have the same 
probability of belonging to the zero component, i.e. the logit-link 
structure does not depend on the predictor variable.
2. For some of the packages (particularly MCMCglmm) I have some 
challenges about how to specify the model correctly.

Does anyopne in the R(mixed) community have experience on this? I 
provide a simple example below to illustrate what I want.

Ivar Herfindal

## A simple reproducible example
library(pscl)
library(glmmADMB)
library(MCMCglmm)
library(gamlss)
set.seed(4242)

# create some dummy data that have a lot of zeros in the dependent variable
dataa <- cbind.data.frame(Y=rbinom(100, 1, 0.7))
dataa$Y <- dataa$Y * rpois(100, 3) # the response
dataa$A <- rnorm(100) # explanatory variable
dataa$gr <- factor(1:10) # grouping variable

# fit models with different packages

## First non-mixed models
### pscl zeroinfl
zimod <- zeroinfl(Y ~ A, data=dataa, dist="poisson")
# Get estimate of A on poisson and binomial process separate, this is 
what I want (I believe so...)

zimod2 <- zeroinfl(Y ~ A|1, data=dataa, dist="poisson")
# All zero counts have the same probability of belonging to the zero 
component

### glmmADMB
admod <- glmmadmb(Y ~ A, data=dataa, family="poisson", zeroInflation=TRUE)
# Same results as zimod2.
# Q: Is it possible to obtain similar results as zimod, i.e.
# separate the effect of A on the poisson and binomial process?

### MCMCglmm
mcmod <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units)
# more or less same as admod and zimod2, but "plot(mcmod$Sol)" reveals 
large variation.
# Can change burn and nitt to get more stable(?)

mcmod2 <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units, burn=10000, nitt=20000)
# Q: Is it possible to separate the effect of A on the poisson and 
binomial process?

### gamlss
gamod <- gamlss(Y ~ A, data=dataa, family=ZIP)
# same as zimod2 and admod

## Some attempts on mixed models with gr as random factor
admodm <- glmmadmb(Y ~ A, data=dataa, family="poisson", 
zeroInflation=TRUE, random=~(1|gr))
mcmodm <- MCMCglmm(Y ~ A, data=dataa, family="zipoisson", 
rcov=~idh(trait):units, random=~idh(trait):gr)
gamodm <- gamlss(Y ~ A + random(gr), data=dataa, family=ZIP)
# For glmmadmb and gamlss I think the models are specified ok
# For MCMCglmm I am not sure if it is specified correctly
# But I still want to be able to separate the effect of A on the two 
processes.
#
Dear Ivar,

Your assumption on MCMCglmm is incorrect. You can specify different predictor for both the count and the binomial component. Have a look at the CourseNotes vignette for an example.

Best regards,

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
#
Dear Thierry

Thank you for your quick response. I have previously tried to follow the 
CourseNote you suggested without success. I gave it another try today, 
and it seems I have manged to generate such a zero-inflated model as I 
want during the day. The syntax looks something like this:

mcmodm1 <- MCMCglmm(Y ~ trait -1 + at.level(trait, 1:2):A, data=dataa,
     family="zipoisson", rcov=~idh(trait):units, nitt=30000, random=~gr)

However, the main challenge is that the model seems quite unstable 
withmy "real" data. Even with 100000 iterations, two identical specified 
models gives quite different output, and plotting the mcmodm1$Sol 
suggest not very stable estimates. This occurs also on my simulated data.

Do anyone have any suggestion on how to make the model more stable (e.g. 
by the rcov or random arguments)? Or do I just have to run it as long as 
it takes to get "fairly precise" estimates?

Ivar
On 14.11.2011 10:40, ONKELINX, Thierry wrote:

  
    
#
Hi,

ZIP models generally mix poorly and often need to be run for long  
periods of time. However, you seem to be estimating a residual  
variance for the zero-inflation bit and this is not identifiable. In  
this case you will never get convergence or good mixing. I recommend  
you fix the residual variance of the zero-inflation process at one  
using a prior like:

prior<-list(R=list(V=diag(2), nu=0.002, fix=1), G=list(G1=list(V=1,  
nu=1, alpha.mu=0, alpha.V=1000)))

I have used a parameter expansion for the random effects. However,  
your specification is a bit odd because you assume that the random  
effects for the poisson and zero-inflation have equal variance and a  
correlation of 1. ~us(trait):gr  or  ~idh(trait):gr are probably more  
reasonable and you *could* use the priors:

prior<-list(R=list(V=diag(2), nu=0.002, fix=1),  
G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000)))

for ~us(trait):gr, and

prior<-list(R=list(V=diag(2), nu=0.002, fix=1),  
G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000)))

~idh(trait):gr.

~gr might make more sense if  you use a zero-altered poisson  
("zapoisson") rather than a zero-inflated poisson.

Cheers,

Jarrod
On 14 Nov 2011, at 16:17, Ivar Herfindal wrote:

            

  
    
2 days later
#
Hi Denise,

Yes I did. Thanks for correcting it.

Jarrod
On 16 Nov 2011, at 19:58, Denise Rocha wrote:

            
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20111116/0fd017ae/attachment.pl>
#
Hi Jarrod (and all other r-sig-mixed-members who answered me)

First, thanks to all who replied. I learned a lot from all answers, and 
it seems I have got a better understanding on the topic now.

Jarrod Hadfields suggestion about priors helped me a lot to get betters 
models. As you suggested my models did not run well at all and there 
were several issues about my code. There were a lot of serious problems 
associated with the models (autocorrelation, correlation between chains 
from different parameters, poor convergence). Refining my code according 
to suggestions makes the models now run much better. I also centralised 
the predictor variable, which helped a lot on the correlation between 
the chains of the parameter estimates for the fixed factors. It also 
appears that the models have to run for some time in order to converge, 
and that I need a quite high thinning to reduce autocorrelation. 
Finally, using the priors suggested by Jarrod (with the 
fix=2-adjustment) of course did the trick and made models more stable 
and reliable. Currently I am running 3 MCMCglmm for each model, with 
nitt=1000000, burn=500000 and thin=1000. This seems to do give 
reasonable output (Gelman diagnostics < 1.02 for all chains), which are 
consistent among similar runs, although it takes some time.

As suggested by other list members, it may be a good idea to verify 
results using a MLE-based approach, for instance using ADMB. A recent 
(future) book was also suggested (even as a Christmas present for 
myself. I was thinking more in line of a pair of new skis, but perhaps a 
book about zero-inflated models and generalized linear mixed models 
would be just as fun). Link to the book:  http://www.highstat.com/book4.htm

Again, thanks to all for the help.

Ivar
On 16.11.2011 21:43, Jarrod Hadfield wrote: