A zero inflated Poisson model in MCMCglmm

Hi,

Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 11 Jul 2012 11:58:50 +0100:
Dear Jarrod and David,

Thank you both very much for your responses: I'm sorry it's taken me  
so long to say that, but I wanted to have a proper go at everything  
before I replied, which took longer than I anticipated!

Thank you especially for your zipoisson code, Jarrod. When I  
initially attempted to use your prior and model MCMCglmm asked me  
for a stronger prior, so I changed it to

Prior1=list(R=list(V=diag(2),nu=1, fix=2), G=list(G1=G1)) : is that  
a reasonable thing to have done? The model

I think it is better to diagnose the problem rather than upping nu. In  
this case I think the residual variance of the Poisson process must be  
going to zero under the flat prior? Perhaps the counts are  
under-dispersed with respect to the Poisson, further motivating an  
ordinal model?

model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=Prior1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)
ran with this prior, however I admit I am now flummoxed as to how to  
proceed and go about estimating heritability.

For this reason, (and because you both recommended it!), I went on  
to try an ordinal model using the following code:

p.var<-var(data$Toes,na.rm=TRUE)

prior2<-list(G=list(G1=list(V=matrix(p.var/2),n=1)),  
R=list(V=matrix(p.var/2),n=1))

IMPORTANT: The residual variance is not identifiable in an ordinal  
model so all the information is coming from the prior. This is OK, but  
I would fix the variance at 1 so as to avoid interpretation  
difficulties (and numerical problems - check the range of the latent  
variables using pl=TRUE).

model2<-MCMCglmm(Toes~1,random=~animal,family="ordinal",pedigree=ped,data=data,prior=prior2,nitt=500000,thin=500,burnin=200000,verbose=FALSE)

followed by the same model but with the prior from the Wilson paper erratum:

prior2.1 <- list(G = list(G1 = list(V = 1, nu = 0.002)), R = list(V  
= 1,nu = 0.002))

the plots for these models look ok, and they both give a  
heritability estimate of ~0.55, with very similar 95% HDP intervals,  
although the auto correlation is quite high ~0.6
Is this calculated as Vanimal/(Vanimal+Vresidual+1)  or  
Vanimal/(Vanimal+Vresidual)?

Next I tried a categorical model, treating Toes as a binary response  
variable (deforemed or not deformed), using

p.var<-var(data$ToesBinary,na.rm=TRUE)

prior3= list(R = list(V = 1, n = 0, fix = 1), G = list(G1 = list(V =  
1,n = 0)))

model3<-  
MCMCglmm(ToesBinary~1,random=~animal,family="categorical",data=data,pedigree=ped,prior=prior3,nitt=200000,thin=250,burnin=100000,verbose =  
FALSE)

Heritability came out higher with this model at about ~0.85.
Is this calculated as Vanimal/(Vanimal+Vresidual+pi^2/3)  or  
Vanimal/(Vanimal+Vresidual)?

And then last but not least a multinomial2 model:

prior4=list(R=list(V=1,nu=0.002),G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=100)))

model4<-MCMCglmm(cbind(Deformed,Fine)~1,random=~animal,family="multinomial2",data=data,pedigree=ped,prior=prior4,nitt=200000,thin=250,burnin=100000,verbose=FALSE)

The variance plots for this one didn't look very pretty and the  
autocorrelation values were still quite high, ~0.8, but it gave me a  
pretty heritability plot and an estimate similar to my ordinal models.

As you can probably gather from all of this, I am something of an  
indecisive person...., but at this stage I am inlcined to stick with  
the ordinal models and believe that there is a heritable element to  
the number of deformed toes (which seems rather intuitive after all  
that!). Given this decision, can you recommend anything I might do  
to reduce the autocorrelation in these models. Also, do I have to  
run the same model many times to confirm its conclusions?
Re: autocorrelation. Use parameter expansion and/or run for longer  
(particularly for binary animal models and/or if there is little  
overdispersion)  If the absolute value of the latent variables under  
the logit link exceed 20, or under the probit link exceed 7, then at  
the moment I would consider another program unless the latent  
variables are associated with particular levels of a fixed effect.   
Then flatish priors on the probability scale may overcome the problem.

Finally finally, I will have to do another zipoisson model in the  
future, can you direct me to where I might find an example of the  
code used for estimating heritability in these cases please?
I think it makes most sense to think about the heritabilities of each  
process separately   - the first as a Poisson trait and the second as  
a threshold (binary) trait.

Many many thanks once more,
Helen

On 08/07/2012 12:03, Jarrod Hadfield wrote:
Hi,

I agree with David that an ordinal model may behave better for this  
type of data (or even a 0/1 response with deformed or not) but in  
case you want to pursue a ZIP model...

The model you have specified is probably not what you had in mind.  
You should think of the ZIP response as being two responses  
(traits): the Poisson counts (trait 1) and the probability that a  
zero comes from the zero-inflation process (trait 2). At the moment  
you have a common intercept for both and therefore assume that the  
probability that a zero is from the zero-inflation process (on the  
logit scale) is equal to the mean Poisson count (on the log scale).  
There is probably little justification for this (but see ZAP  
models). Moreover, you have a single animal term, and therefore  
assume that both processes have the same genetic variance and that  
the genetic correlation between them is 1. You also assume that the  
residual variance for each process is equal (but the residual  
correlation is 0), although this is not actually that bigger deal  
because residual variation in the zero-inflation process (and  
residual corrletaion) is not identifiable from the data and so the  
value it could take is essentially arbitrary. However, I like to  
recognise this non-identifiability by fixing the variance at 1 and  
the covariance at zero (but see ZAP models where your specification  
is useful).

So, a more appropriate model would be:

model1.1<-MCMCglmm(Toes~trait,random=~us(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) or  
perhaps

model1.2<-MCMCglmm(Toes~trait,random=~idh(trait):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) The difference between these two is that the genetic correlation between the zero-inflation and the Poisson processes is estimated, and in the latter it is set to  
0.

I'm not sure about how much data you have, but I would think that  
if the incidence of deformed toes is very low, you probably have  
little information for some of the parameters. In this case you  
have to be VERY careful with priors. The prior recommendations in  
the MCMCglmm tutorial in Wilson et al. 2011 will generally be quite  
informative for small or even moderate sized datasets (the authors  
have kindly published an erratum as many people seem to be using  
their recommendation without realising its implications).

The prior could look like:

prior=list(R=list(V=diag(2), nu=0, fix=2), G=list(G1=G1))

which fixes the residual variance in the zero-inflation process to  
1 with a flat improper prior on the Poisson over-dispersion. For  
the prior on the genetic (co)variances (G1) I generally use  
parameter expanded priors of the form:

G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000)

or

G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000)

for model.2.

These generally give posterior modes for the variance components  
that are close to (RE)ML estimates, although it is unclear to me  
how they behave for the covariances and functions of variances  
(e.g. heritability). At least for binary traits (e.g. the  
zero-inflation bit) I have seen a chi-square prior recommended  
because of its  prior properties on the heritability (de  
Villemereuil in MEE). This type of prior can also be obtained using  
parameter expansion:

(V=1, nu=1000, alpha.mu=0, alpha.V=1)

and I would guess that the multivariate analogue would be  
(V=diag(2), nu=1001, alpha.mu=c(0,0), alpha.V=diag(2)). However, a  
chi-square prior probably has poor properties for the Poisson  
variance and in MCMCglmm you could only mix these different priors  
for model.2 using a different syntax:

model1.2<-MCMCglmm(Toes~trait,random=~idh(at(trait,1)):animal+idh(at(trait,2)):animal,family="zipoisson",rcov=~idh(trait):units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE)  
G=list(
   G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000),
   G2=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)
 )

Cheers,

Jarrod

Quoting Helen Ward <h.l.ward at qmul.ac.uk> on Wed, 04 Jul 2012 16:24:36 +0100:

Dear list,

I work on bats, some of which have deformed toes. I have just started
trying to use MCMCglmm to investigate whether there is hertiable
variation for the number of deformed toes a bat has. Since most bats
have 0 deformed toes I am using a zero-inflated Poisson model.

I have put together some code (see below) based on the MCMCglmm tutorial
from the Wilson et al. 2011 'Ecologist's guide to the animal model'
paper, the MCMCglmm Course Notes and trial and error. This code runs and
I get a model and a heritability estimate. However, the heritability
estimate I get is extremely high and this, teamed with the observations
that my posterior distributions of the variance componants are not
pretty as the example ones I've seen and my autocorrelation values are
high (~0.4), makes me suspicious that I am not modelling this trait very
well. I suspect I have over simplified the model. I based my prior on a
model done in ASReml modelling 'Toes' as a Poisson distribution, which
can an heritability estimate of 0.35.

In addition, when I try and add a second random variable - year of birth
(Born) - into a second model, both plots and auto correlation get worse
and my heritability estimate for number of deformed toes (Toes) goes
from about 0.8 to 0.05!

Finally, I suspect you're not, but I would like to know if you are
allowed to compare DIC values from models with different priors please.

I have pasted my code below. If anyone can offer constructive criticism
it would be much appreciated.

Many thanks,
Helen

Model 1: With number of deformed toes (Toes) as the only random variable

p.var<-var(data$Toes,na.rm=TRUE)
prior1.1<-list(G=list(G1=list(V=matrix(p.var*0.35),n=1)),
R=list(V=matrix(p.var*0.65),n=1))

model1.1<-MCMCglmm(Toes~1,random=~animal,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.1,nitt=500000,thin=500,burnin=200000,verbose=FALSE) posterior.heritability1.1<-model1.1$VCV[,"animal"]/(model1.1$VCV[,"animal"]+model1.1$VCV[,"trait:units"])  
HPDinterval(posterior.heritability1.1,0.95)
posterior.mode(posterior.heritability1.1)

Model 2: With number of toes (Toes) and year of birth (Born) as random
variables

p.var<-var(data$Toes,na.rm=TRUE)
prior1.2<-list(G=list(G1=list(V=matrix(p.var/3),n=1),G2=list(V=matrix(p.var/3),n=1)),R=list(V=matrix(p.var/3),n=1))  
model1.2<-
MCMCglmm(Toes~1,random=~animal+Born,family="zipoisson",rcov=~trait:units,pedigree=ped,data=data,prior=prior1.2,nitt=1000000,thin=1000,burnin=500000,verbose=FALSE) posterior.heritability1.2<-model1.2$VCV[,"animal"]/(model1.2$VCV[,"animal"]+model1.2$VCV[,"Born"]+model1.2$VCV[,"trait:units"])  
HPDinterval(posterior.heritability1.2,0.95)
posterior.mode(posterior.heritability1.2)

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

A zero inflated Poisson model in MCMCglmm

Thread (7 messages)