Skip to content

Zero-inflated mixed effects model - clarification of zeros modeled and R package questions

2 messages · Ben Bolker, Jennifer Barrett

#
A couple of quick responses:

Hi folks,

  [snip]

Thanks for replying so quickly Alain ? it?s much appreciated. To follow-up
on your comments:

-          Re: Spatial Autocorrelation  - I have dealt with spatial
autocorrelation in the past, though with continuous log-normal data (no
random effects - hence I used a spatial autoregressive model). I have
mentioned the likelihood of spatial autocorrelation in the residuals to my
employer/supervisor; however, he has advised that we proceed with the model
without accounting for autocorrelation, expecting that a large part it may
be explained by the environmental variables (which are no doubt clustered)
once the model is fitted. I?m skeptical, as some of these species also
might seek ?safety in numbers? selecting sites based on the abundance of
conspecifics nearby, and large flocks at a given site are likely to utilize
habitat at neighboring sites as well (if suitable). We shall see!

BMB>  You can always do a post-fitting test, graphical or statistical, for
the presence of spatial autocorrelation -- if you don't see anything
(clustering of residuals in a spatial plot of residuals, significant
Moran's I, or interesting-looking spatial variogram/correlogram)
then you should be OK ...

-          Re: random effect in the binomial process of a ZIP - don?t I
have to include this, given the repeated measures?

BMB>  It depends.  In principle, there could be a random effect in
the binomial process of the ZIP.  In practice, at some point the
model becomes too computationally unwieldy/unstable, due to complexity
and possible overfitting.  Again, you can take the general strategy
of leaving out potentially difficult model complications, then see if
you can detect them in the residuals (in this case, differences in
deviation between predicted vs actual zeros in different groups)

Thanks to everyone else as well for your input. After reading your
responses, and diving into the lit a little more, you've convinced me that
MCMC is the way to go. However, I now have a few more quick (hopefully?)
questions:
- Because I'm a tad afraid of WinBugs, I decided to look at MCMCglmm as
well. I noticed that the course notes for MCMCglmm state that ?*As is often
the case the parameters of the zero-inflation model mixes poorly? Poor
mixing is often associated with distributions that may not be zero-inflated
but instead over-dispersed.*?  Am I correct in thus assuming that if the
data are indeed zero-inflated, ?poor mixing? is not a problem? Or might
this also arise through other means?

BMB>  Poor mixing can happen any time you have a complex model.
Check the trace plots.

- Is there an advantage to using MCMCglmm versus winBUGS or vice versa? It
seems either one will take some time to correctly code/specify, so I might
as well go the route that makes the most sense/is more highly recommended.

BMB> WinBUGS is more flexible, MCMCglmm is (much) faster and easier
for those problems which it can handle.  If you don't see yourself
needing to go beyond the problems that MCMCglmm can handle, I would
stick with it.

- And most importantly: As I mentioned in my original message, we had
wanted to compare competing hypotheses for what shoreline attributes
influence shorebird distributions, and to then use MMI in prediction;
however, I?ve read that DIC is not recommended for mixed effects models
(even though MuMIn accepts MCMCglmm output). According to a post by Jarrod
Hadfield, this is especially true for non-Gaussian data because the level
of focus is on the sampled observations (i.e., for ?*observations (y) on
children within schools...DIC would be focused at "can we predict how many
times *these* children miss the bus*"*)*. What are my options then for
model comparison/selection and prediction? Recall that we want to estimate
the total abundance of each shorebird species within the entire study
region (with confidence intervals). I'm really stuck here...

BMB> DIC is indeed problematic for several reasons: there's the
level-of-focus problem, and the problem that its derivation assumes
multivariate normal posterior distributions ...  You could try to count
parameters in a naive way (i.e. one parameter per variance or
covariance parameter, which is probably the right way to do it
for the "population" level of focus -- see Vaida and Blanchard 2005),
and use AIC based on the mean deviance as suggested by
Brooks, S.  2002.  Discussion of the paper by Spiegelhalter, Best,
Carlin, and van der Linde.  Journal of the Royal Statistical Society
B.  64: 616-618.

  I would also say that you could just hope that one model
stands out so that you don't have to use MMI ...

  Ben Bolker

Thanks in advance... this is a huge statistical leap for me.

Cheers,
Jenn
On Thu, Jun 21, 2012 at 8:43 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:

            

  
    
11 days later