Dear R community,
I have some questions regarding the analysis of a zero-inflated count
dataset and repeated measures design.
The dataset is arranged as follows :
Unit of analysis: point - these are points were bird were counted during a
certain amount of time. In total we have about 175 points. Each point is
located within a certain habitat fragment (here: "site" =
A-B-C-D-..., in reality we have 25 sites,i.e. forest fragments). All points
were counted five times
during three years ( thus in total, each point was counted 15 times). We want
to relate the bird abundance to
a number of habitat variables (here: X1-X2-X3) collected at the site level.
Abundance: this is the number
of birds counted at a point. In most cases ( > 90%), no birds were detected
and the abundance dataset is thus zero-inflated.
I have been looking for a code to analyze this zero-inflated poisson
distributed dataset with a repeated
measures design, and I have arrived at the glmmADMB package.
library(glmmADMB)
data <- read.table("D:/Boris/Borisdataset.csv",sep=",",header=TRUE)
count <- data$count
site <- data$site
abundance <- data$abundance
[ for clarity: in the above syntax: count ranges from 1-5 as each site has
been counted 5 times in a year, site
refers to one of the 25 forest fragments in which the point counts were
conducted, Xi are the habitat variables].
My questions are:
- does it make sense to analyze these data at the point level, as all habitat
variables are collected at the
site level, meaning that for all points belonging to a certain forest
fragment, the habitat variables
have the same value. If it does make sense, is the proposed syntax ok? Is
there any option to include year as a
random effect, as I am not especially interested in differences between years.
-it looks appealing to average the point count values for each forest
fragment, and to analyze the data with
"forest fragment" as unit of analysis. However, also when averaging across
fragments, the dataset is
still zero-inflated. It is however impossible to a zero-inflated Poisson
distribution for this
analysis, as the averaged forest fragment values are not always discrete
values.
Rather than averaging, one can side-step that problem by instead summing over
points within sites and using a log(time) offset for any fixed differences in
time of observation across sites.
It sounds sensible to me to take the approach of a site-level analysis, but my
credentials are not in statistics so it's possible that a more authoritative
answer would be offered.