Zero-inflated model inquiry

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120926/be9d3371/attachment.pl>
Hi Peter,

Your assumption that Before and During are contrasted with After is
correct. By default R parameterizes categorical variables using
treatment contrasts which compare each level to the first one, and the
default sorting is lexicographic, so AFTER becomes the first level.
Your model is indicating that the average abundance both BEFORE and
DURING are significantly different from the AFTER. It sounds like what
you'd like to know is also BEFORE different from DURING. I see a
couple things you could try
1) Make predictions of the average urchin_abundance from the model for
each period along with confidence intervals. Use the confidence
intervals to decide what is the same and different.
2) Change your formula to urchin_density~impact_period-1. This will
give you a distinct estimate for each period, and make construction of
the confidence intervals in 1 very easy, but still won't give you all
the pairwise comparisons.
3) Check the package multcomp and use it to find the appropriate
contrasts for all three levels. I'm not sure this will work for models
from the pscl package.

hth
Greetings -

I have a question regarding the use of zero-inflated models for count
data.  I have a very basic count dataset consisting of sea urchin density
estimates conducted across 20 sites (random: pooled for this example)
during three timeframes (fixed: 1-before disturbance, 2-during disturbance,
and 3-after disturbance).  For this example, I'm simply looking to
interpret significant differences across timeframes.  After initial
examinations, the data lend themselves well to an overdispersed, negative
binomial distribution (i.e., hurdle approach using the R package pscl).

Using the code:

f1<-formula(urchin_density~impact_period)
H1<-hurdle(f1, dist="negbin", link="logit")
summary(H1)
provides:

Count model coefficients (truncated negbin with log link):
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)           0.7212     0.1546   4.664 3.10e-06 ***
impact_periodBefore   0.6374     0.1713   3.720 0.000199 ***
impact_periodDuring   0.6850     0.1696   4.039 5.37e-05 ***
Log(theta)           -0.6671     0.2262  -2.949 0.003184 **
Zero hurdle model coefficients (binomial with logit link):
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)          0.51904    0.12824   4.048 5.18e-05 ***
impact_periodBefore  0.01869    0.20111   0.093    0.926
impact_periodDuring -0.03353    0.19718  -0.170    0.865
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta: count = 0.5132
Number of iterations in BFGS optimization: 11
Log-likelihood: -1377 on 7 Df

Before moving to more complex models, my question is regarding whether or
not this is the right approach, and if so, why are there no results for the
"after" impact period.  My assumption is that both the "before" and
"during" time periods are being contrasted against the "after" here, but
how can one contrast all three groups to look for significance?  Last, how
does one logically translate the two parts of the results?

Insight appreciated, I'm aware there are extensive textbooks on the
subject, but trying to get an initial feel for things.

Peter

--
Peter Houk, PhD
Chief Biologist
Pacific Marine Resources Institute
www.pacmares.com
www.micronesianfishing.com

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
http://aminpractice.blogspot.com
http://www.flickr.com/photos/atiretoo