fitting a distribution to zero-inflated catch per unit effort mixed model
[cc'ing back to r-sig-mixed-models]
On 12-02-25 03:53 PM, Karla Letto wrote:
Thank you Ben for the detailed response. I greatly appreciate it. I will let you know how it goes after I give each a try. In answer to your questions. My sample size is 25 Meadow Voles.
Meaning you caught a total of 25 voles in the whole study? Be warned, you may find that your model is overfitted -- typically you can fit about/at most 1 parameter per 10 (effective) data points, which is something between 25 (the number of voles) and 48 (observations) -- so your fixed-effect parameters (line+habitat+type = 6 parameters) are already on the verge of more information than you can estimate, even before you start counting random effects (3 variances, for site/cycle/observation-level variation).
I calculated CPUE for each line. So I have a total of 48 observations (3 lines for 8 sites and each site was visited twice. I thought I had to use cycle as a random factor to account for temporal pseudoreplication. I can see how your idea of nesting cycle within site as a random factor will work better though. I tested for normality using a normal q-q plot and homogeneity by plotting the residuals and fitted values (I kept getting a cone shape).
Hmm. A cone shape does imply heteroscedasticity that isn't handled by the model assumptions ... I *think* resid() should give you Pearson residuals (i.e. already corrected assuming variance=mean). So that's a little puzzling. I don't expect normality at all in residuals from a model with a mean of ~ 2 individuals per sample ...
qqnorm(resid(model))
qqline(resid(model))
plot(fitted(model),resid(model))
Can you please tell me why I have to use LOG(effort) as an offset instead of just effort as the offset? That is not the first time I heard that it had to be LOG(effort) but I cannot find a reason for why.
Because the offset is added on the scale of the linear predictor, which in this case is the log scale -- i.e., the expected mean number of counts is exp([fixed effect terms] + [random effect terms] + offset) = exp([fixed effect terms] + [random effect terms])* exp(offset)
Thank you,
Karla
On Thu, Feb 23, 2012 at 7:40 PM, Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
Karla Letto <karla.letto at ...> writes:
> I am having trouble fitting a distribution to my mixed model for
meadow
> vole catch per unit effort (CPUE) data. I have tried several
families and
> cannot find one that does not violate both the homogeneity and
normality
> assumptions. NOTE: The data set is zero-inflated (no captures).
> Here is my study design:
>
> I am trying to determine if the CPUE of meadow voles differ among
lines
> (line = near,mid, or far) at increasing distances from a linear
feature
> (type = road, trail or powerline corridor) in two different
habitat types
> (habitat=forest or barren).
> Response variable: catch per unit effort (non-integer values)
> Fixed explanatory variables: line (3 categories), habitat (2
categories),
> type (3 categories)
> Random explanatory variable: site (8 categories), cycle (2 categories)
> The random variable site is for the 8 different sites I sampled in (4
> barren and 4 forest) and the cycle is there because I visited each
site
> twice.
Practically speaking you probably can't use cycle as a random
effect; you can include cycle as a fixed effect (specifying the
difference between first & second visits), and possibly
nesting it within site as a random effect (if you have more than
one observation per site/sample combination).
What is the total size (number of observations) in your data set?
> Here is an excerpt of my data set:
> CE2 catch effort site line cycle habitat type
> 0.000000 0 57.5 A near first forest trail
> 3.278689 2 61.0 A mid first forest trail
> 0.000000 0 60.5 A far first forest trail
> 0.000000 0 66.5 G near first barren road
> 0.000000 0 74.5 G mid first barren road
> 0.000000 0 74.0 G far first barren road
> 1.449275 1 69.0 E near second barren powerline
> 0.000000 0 73.0 E mid second barren powerline
> 0.000000 0 71.5 E far second barren powerline
>
> I tried the lme4 package using the following syntax:
>
[snip]
>
> I then tried using a poisson error structure using catch (the
actual number
> of animals) as the response and incorporated effort as an offset.
Effort as
> an offset is commonly used for analysis of CPUE data.
>
> Model2<-
> glmer(catch~line+habitat+type+(1|site)+(1|cycle)+
> offset(effort),family=poisson)
This is a good way to do it, but you need to incorporate the
LOG of effort. You may also need to account for overdispersion
and/or zero-inflation; the former via incorporating an observation-level
random effect (in glmer, glmmadmb, or MCMCglmm) or negative binomial
distribution (in glmmadmb), the latter (if necessary) via zero-inflation
or hurdle models (in glmmadmb or MCMCglmm).
[snip snip snip]
> Does anyone have any suggestions on how I can analyze
zero-inflated CPUE
> data? I have been trying to figure out how to do a Monte Carlo
permutation
> test for a mixed model but I am having trouble figuring out the
syntax. Any
> help would be greatly appreciated.
What are you using to assess homogeneity of variance and normality?
Normality of residuals can only be expected approximately (and in
the case of
large mean counts) in this case.
I would start off this way:
mydata$obs <- factor(seq(nrow(mydata)))
glmer(catch~line+habitat+type+cycle+(1|site/cycle)+(1|obs)+
offset(log(effort)),family=poisson, data=mydata)
You can use the simulate() method to simulate data sets, count
the proportion of zeros expected, and see if your observed
proportion of zeros is off ...
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models