Predictions from zero-inflated or hurdle models

Hi,

Sorry - I should have replied to this post earlier. I've been working  
on predict/simulate methods for all MCMcglmm models (including  
zero-inflated/altered/hurdle/truncated models) and these are now  
largely complete (together with newdata options).

When predictions are to be taken after marginalising the random  
effects (including the `residual' over-dispersion) it is not possible  
to obtain closed form expressions. The options that will be available  
in MCMCglmm are:

1) algebraic approximation
2) numerical integration
3) simulation

1) and 2) are currently only accurate when the random/residual effect  
structure implies no covariance between the Poisson part and the  
binary part.

1) is reasonably accurate for zero-inflated distributions, but can be  
pretty poor for the remainder because they all involve zero-truncated  
Poisson log-normal distributions and my taylor approximation for the  
mean is less than ideal (any suggestions would be helpful).

2) could be extended to double integration in which case covariance  
between the Poisson part and the binary part could be handled.

In your code, part of the problem is that you have fitted a zapoisson,  
but the prediction is based on a zipoisson (with complementary log-log  
link rather than logt link).

In all zero-inflated/altered/hurdle/truncated models

E[y] = E[(1-prob)*meanc]

where prob is the probabilty of a zero in the binary part and meanc is  
the mean of a Poisson distribution (zipoisson) or a zero-truncated  
poisson (zapoisson and hupoisson). If we have eta_1 as the linear  
predictor for the poisson part and eta_2 as the linear predictor for  
the binary part:

In zipoisson: prob = plogis(eta_2)     and meanc = exp(eta_1)
In zapoisson: prob = exp(-exp(eta_2))  and meanc =  
exp(eta_1)/(1-exp(-exp(eta_1)))
In hupoisson: prob = plogis(eta_2)     and meanc =  
exp(eta_1)/(1-exp(-exp(eta_1)))
In ztpoisson: prob = 0                 and meanc =  
exp(eta_1)/(1-exp(-exp(eta_1)))

In each case the linear predictor has a `fixed' part and a `random'  
part which I'll denote as `a' and `u' respectively. Ideally we would  
want

E[(1-prob)*meanc] taken over u_1 & u_2

if prob and meanc are independent this is a bit easier as

E[y] = E[1-prob]E[meanc]

and the two expectations ony need to taken with repsect to their  
respective random effects. If we have sd_1 and sd_2 as the standard  
deviations of the two sets of random effects then for the zapoisson:

   normal.evd<-function(x, mu, v){
      exp(-exp(x))*dnorm(x, mu, sqrt(v))
   }
   normal.zt<-function(x, mu, v){
     exp(x)/(1-exp(-exp(x)))*dnorm(x, mu, sqrt(v))
   }

   pred<-function(a_1, a_2, sd_1, sd_2){
     prob<-1-integrate(normal.evd, qnorm(0.0001, a_2,sd_2),  
qnorm(0.9999, a_2,sd_2), a_2,sd_2)[[1]]
     meanc<-integrate(normal.zt, qnorm(0.0001, a_1,sd_1),  
qnorm(0.9999, a_1,sd_1), a_1,sd_1)[[1]]
     prob*meanc
   }

#  gives the expected value with reasonable accuracy.  As an example:

   x<-rnorm(300)
   l1<-rnorm(300, 1/2+x, sqrt(1))
   l2<-rnorm(300, 1-x, sqrt(1))

   y<-rbinom(300, 1, 1-exp(-exp(l2)))
   y[which(y==1)]<-qpois(runif(sum(y==1), dpois(0,  
exp(l1[which(y==1)])), 1), exp(l1[which(y==1)]))
   # cunning sampler from Peter Dalgaard (R-sig-mixed)

   data=data.frame(y=y, x=x)
   prior=list(R=list(V=diag(2), nu=0.002, fix=2))

   m1<-MCMCglmm(y~trait+trait:x-1, rcov=~idh(trait):units, data=data,  
family="zapoisson", prior=prior)

   b_1<-colMeans(m1$Sol)[c(1,3)]
   b_2<-colMeans(m1$Sol)[c(2,4)]
   sd_1<-mean(sqrt(m1$VCV[,1]))
   sd_2<-mean(sqrt(m1$VCV[,2]))

   # note it is more accurate to take the posterior mean prediction  
rather than the prediction from the posterior means as I've done here,  
but for illustration:

   x.pred<-seq(-3,3,length=100)
   p<-1:100
   for(i in 1:100){
     p[i]<-pred(a_1 = b_1[1]+x.pred[i]*b_1[2], a_2 =  
b_2[1]+x.pred[i]*b_2[2], sd_1=sd_1, sd_2=sd_2)
   }

   plot(y~x)
   lines(p~x.pred)

Cheers,

Jarrod

Quoting Ruben Arslan <rubenarslan at gmail.com> on Tue, 17 Mar 2015  
13:33:25 +0000:

Predictions from zero-inflated or hurdle models

Thread (11 messages)