An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120926/be9d3371/attachment.pl>
Zero-inflated model inquiry
2 messages · Peter Houk, Drew Tyre
Hi Peter, Your assumption that Before and During are contrasted with After is correct. By default R parameterizes categorical variables using treatment contrasts which compare each level to the first one, and the default sorting is lexicographic, so AFTER becomes the first level. Your model is indicating that the average abundance both BEFORE and DURING are significantly different from the AFTER. It sounds like what you'd like to know is also BEFORE different from DURING. I see a couple things you could try 1) Make predictions of the average urchin_abundance from the model for each period along with confidence intervals. Use the confidence intervals to decide what is the same and different. 2) Change your formula to urchin_density~impact_period-1. This will give you a distinct estimate for each period, and make construction of the confidence intervals in 1 very easy, but still won't give you all the pairwise comparisons. 3) Check the package multcomp and use it to find the appropriate contrasts for all three levels. I'm not sure this will work for models from the pscl package. hth
On Tue, Sep 25, 2012 at 10:50 PM, Peter Houk <peterhouk at gmail.com> wrote:
Greetings - I have a question regarding the use of zero-inflated models for count data. I have a very basic count dataset consisting of sea urchin density estimates conducted across 20 sites (random: pooled for this example) during three timeframes (fixed: 1-before disturbance, 2-during disturbance, and 3-after disturbance). For this example, I'm simply looking to interpret significant differences across timeframes. After initial examinations, the data lend themselves well to an overdispersed, negative binomial distribution (i.e., hurdle approach using the R package pscl). Using the code:
f1<-formula(urchin_density~impact_period) H1<-hurdle(f1, dist="negbin", link="logit") summary(H1)
provides:
Count model coefficients (truncated negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.7212 0.1546 4.664 3.10e-06 ***
impact_periodBefore 0.6374 0.1713 3.720 0.000199 ***
impact_periodDuring 0.6850 0.1696 4.039 5.37e-05 ***
Log(theta) -0.6671 0.2262 -2.949 0.003184 **
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.51904 0.12824 4.048 5.18e-05 ***
impact_periodBefore 0.01869 0.20111 0.093 0.926
impact_periodDuring -0.03353 0.19718 -0.170 0.865
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta: count = 0.5132
Number of iterations in BFGS optimization: 11
Log-likelihood: -1377 on 7 Df
Before moving to more complex models, my question is regarding whether or
not this is the right approach, and if so, why are there no results for the
"after" impact period. My assumption is that both the "before" and
"during" time periods are being contrasted against the "after" here, but
how can one contrast all three groups to look for significance? Last, how
does one logically translate the two parts of the results?
Insight appreciated, I'm aware there are extensive textbooks on the
subject, but trying to get an initial feel for things.
Peter
--
Peter Houk, PhD
Chief Biologist
Pacific Marine Resources Institute
www.pacmares.com
www.micronesianfishing.com
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Drew Tyre School of Natural Resources University of Nebraska-Lincoln 416 Hardin Hall, East Campus 3310 Holdrege Street Lincoln, NE 68583-0974 phone: +1 402 472 4054 fax: +1 402 472 2946 email: atyre2 at unl.edu http://snr.unl.edu/tyre http://aminpractice.blogspot.com http://www.flickr.com/photos/atiretoo