fitting a distribution to zero-inflated catch per unit effort mixed model

[cc'ing back to r-sig-mixed-models]
Thank you Ben for the detailed response. I greatly appreciate it. I will
let you know how it goes after I give each a try. In answer to your
questions. My sample size is 25 Meadow Voles. 
Meaning you caught a total of 25 voles in the whole study?  Be warned,
you may find that your model is overfitted -- typically you can fit
about/at most 1 parameter per 10 (effective) data points, which is
something between 25 (the number of voles) and 48 (observations) -- so
your fixed-effect parameters (line+habitat+type = 6 parameters) are
already on the verge of more information than you can estimate, even
before you start counting random effects (3 variances, for
site/cycle/observation-level variation).
I calculated CPUE for each
line. So I have a total of 48 observations (3 lines for 8 sites and each
site was visited twice. I thought I had to use cycle as a random factor
to account for temporal pseudoreplication.  I can see how your idea of
nesting cycle within site as a random factor will work better though. 

I tested for normality using a normal q-q plot and homogeneity by
plotting the residuals and fitted values (I kept getting a cone shape).
Hmm.  A cone shape does imply heteroscedasticity that isn't handled by
the model assumptions ... I *think* resid() should give you Pearson
residuals (i.e. already corrected assuming variance=mean). So that's a
little puzzling.  I don't expect normality at all in residuals from a
model with a mean of ~ 2 individuals per sample ...

qqnorm(resid(model))

qqline(resid(model))

plot(fitted(model),resid(model))
Can you please tell me why I have to use LOG(effort) as an offset
instead of just effort as the offset? That is not the first time I heard
that it had to be LOG(effort) but I cannot find a reason for why. 
Because the offset is added on the scale of the linear predictor,
which in this case is the log scale -- i.e., the expected mean number of
counts is

  exp([fixed effect terms] + [random effect terms] + offset) =
exp([fixed effect terms] + [random effect terms])* exp(offset)
Thank you, 

Karla 

On Thu, Feb 23, 2012 at 7:40 PM, Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:

    Karla Letto <karla.letto at ...> writes:

    > I am having trouble fitting a distribution to my mixed model for
    meadow
    > vole catch per unit effort (CPUE) data. I have tried several
    families and
    > cannot find one that does not violate both the homogeneity and
    normality
    > assumptions. NOTE: The data set is zero-inflated (no captures).
    > Here is my study design:
    >
    > I am trying to determine if the CPUE of meadow voles differ among
    lines
    > (line = near,mid, or far) at increasing distances from a linear
    feature
    > (type = road, trail or powerline corridor) in two different
    habitat types
    > (habitat=forest or barren).
    > Response variable: catch per unit effort (non-integer values)
    > Fixed explanatory variables: line (3 categories), habitat (2
    categories),
    > type (3 categories)
    > Random explanatory variable: site (8 categories), cycle (2 categories)

    > The random variable site is for the 8 different sites I sampled in (4
    > barren and 4 forest) and the cycle is there because I visited each
    site
    > twice.
     Practically speaking you probably can't use cycle as a random
    effect; you can include cycle as a fixed effect (specifying the
    difference between first & second visits), and possibly
    nesting it within site as a random effect (if you have more than
    one observation per site/sample combination).

     What is the total size (number of observations) in your data set?

    > Here is an excerpt of my data set:
    >    CE2  catch effort site line   cycle habitat      type
    > 0.000000     0   57.5    A near  first   forest     trail
    > 3.278689     2   61.0    A   mid   first   forest     trail
    > 0.000000     0   60.5    A   far   first   forest     trail
    > 0.000000     0   66.5    G near  first   barren       road
    > 0.000000     0   74.5    G   mid   first   barren       road
    > 0.000000     0   74.0    G   far   first   barren       road
    > 1.449275     1   69.0    E near second  barren powerline
    > 0.000000     0   73.0    E   mid second  barren powerline
    > 0.000000     0   71.5    E   far second  barren powerline
    >
    > I tried the lme4 package using the following syntax:
    >
     [snip]
    >
    > I then tried using a poisson error structure using catch (the
    actual number
    > of animals) as the response and incorporated effort as an offset.
    Effort as
    > an offset is commonly used for analysis of CPUE data.
    >
    > Model2<-
    > glmer(catch~line+habitat+type+(1|site)+(1|cycle)+
    >  offset(effort),family=poisson)
     This is a good way to do it, but you need to incorporate the
    LOG of effort.  You may also need to account for overdispersion
    and/or zero-inflation; the former via incorporating an observation-level
    random effect (in glmer, glmmadmb, or MCMCglmm) or negative binomial
    distribution (in glmmadmb), the latter (if necessary) via zero-inflation
    or hurdle models (in glmmadmb or MCMCglmm).

     [snip snip snip]

    > Does anyone have any suggestions on how I can analyze
    zero-inflated CPUE
    > data? I have been trying to figure out how to do a Monte Carlo
    permutation
    > test for a mixed model but I am having trouble figuring out the
    syntax. Any
    > help would be greatly appreciated.
     What are you using to assess homogeneity of variance and normality?
    Normality of residuals can only be expected approximately (and in
    the case of
    large mean counts) in this case.

      I would start off this way:

    mydata$obs <- factor(seq(nrow(mydata)))
    glmer(catch~line+habitat+type+cycle+(1|site/cycle)+(1|obs)+
     offset(log(effort)),family=poisson, data=mydata)

    You can use the simulate() method to simulate data sets, count
    the proportion of zeros expected, and see if your observed
    proportion of zeros is off ...

    _______________________________________________
    R-sig-mixed-models at r-project.org
    <mailto:R-sig-mixed-models at r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models