Zero-inflated model inquiry
Hi Peter, With a simple model you may get satisfaction with glht() in multcomp, but once your models get more complicated, success with glht tends to wane - at least in my experience. [If anyone can provide code to use glht for multiple factors or whether addressing each factor one at a time works in this situation, please post.] One approach to getting those 'missing' comparisons is to change the base in the contrast treatments. For example: contrasts(data$impact_period)=contr.treatment(levels(data$impact_period),base="BEFORE") Someone can correct me if I am wrong, but because these are treatment contrasts (mean effect), moving the reference level has no effect on coefficients. Often I put the line above in a loop and change the base accordingly. cheers, trevor
On 9/27/2012 7:00 AM, r-sig-ecology-request at r-project.org wrote:
Send R-sig-ecology mailing list submissions to r-sig-ecology at r-project.org To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology or, via email, send a message with subject or body 'help' to r-sig-ecology-request at r-project.org You can reach the person managing the list at r-sig-ecology-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of R-sig-ecology digest..." Today's Topics: 1. Re: Zero-inflated model inquiry (Drew Tyre) ---------------------------------------------------------------------- Message: 1 Date: Wed, 26 Sep 2012 08:38:27 -0500 From: Drew Tyre <atyre2 at unl.edu> To: Peter Houk <peterhouk at gmail.com> Cc: r-sig-ecology at r-project.org Subject: Re: [R-sig-eco] Zero-inflated model inquiry Message-ID: <CAE-6tvwbp7e=ABLYK_HdbVR6R_5do=-RVjUm=fyaP47K622uFA at mail.gmail.com> Content-Type: text/plain; charset="ISO-8859-1" Hi Peter, Your assumption that Before and During are contrasted with After is correct. By default R parameterizes categorical variables using treatment contrasts which compare each level to the first one, and the default sorting is lexicographic, so AFTER becomes the first level. Your model is indicating that the average abundance both BEFORE and DURING are significantly different from the AFTER. It sounds like what you'd like to know is also BEFORE different from DURING. I see a couple things you could try 1) Make predictions of the average urchin_abundance from the model for each period along with confidence intervals. Use the confidence intervals to decide what is the same and different. 2) Change your formula to urchin_density~impact_period-1. This will give you a distinct estimate for each period, and make construction of the confidence intervals in 1 very easy, but still won't give you all the pairwise comparisons. 3) Check the package multcomp and use it to find the appropriate contrasts for all three levels. I'm not sure this will work for models from the pscl package. hth On Tue, Sep 25, 2012 at 10:50 PM, Peter Houk <peterhouk at gmail.com> wrote: Greetings - I have a question regarding the use of zero-inflated models for count data. I have a very basic count dataset consisting of sea urchin density estimates conducted across 20 sites (random: pooled for this example) during three timeframes (fixed: 1-before disturbance, 2-during disturbance, and 3-after disturbance). For this example, I'm simply looking to interpret significant differences across timeframes. After initial examinations, the data lend themselves well to an overdispersed, negative binomial distribution (i.e., hurdle approach using the R package pscl). Using the code: f1<-formula(urchin_density~impact_period) H1<-hurdle(f1, dist="negbin", link="logit") summary(H1) provides: Count model coefficients (truncated negbin with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.7212 0.1546 4.664 3.10e-06 *** impact_periodBefore 0.6374 0.1713 3.720 0.000199 *** impact_periodDuring 0.6850 0.1696 4.039 5.37e-05 *** Log(theta) -0.6671 0.2262 -2.949 0.003184 ** Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.51904 0.12824 4.048 5.18e-05 *** impact_periodBefore 0.01869 0.20111 0.093 0.926 impact_periodDuring -0.03353 0.19718 -0.170 0.865 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Theta: count = 0.5132 Number of iterations in BFGS optimization: 11 Log-likelihood: -1377 on 7 Df Before moving to more complex models, my question is regarding whether or not this is the right approach, and if so, why are there no results for the "after" impact period. My assumption is that both the "before" and "during" time periods are being contrasted against the "after" here, but how can one contrast all three groups to look for significance? Last, how does one logically translate the two parts of the results? Insight appreciated, I'm aware there are extensive textbooks on the subject, but trying to get an initial feel for things. Peter -- Peter Houk, PhD Chief Biologist Pacific Marine Resources Institute www.pacmares.com www.micronesianfishing.com [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology