I am new to R and would like some guidance on how to block by subject.
I collected presence/absence 24 hrs/day for 44 birds, denoted "bird", at 5
stations, denoted "colony" for the duration of a breeding season. The
breeding season was broken into biologically relevant time periods (3
levels), denoted "period". The birds were captured on 2 different colonies,
22 on each colony, denoted "homecolony". I also have the variable, "sex". I
have about 300,000 total observations for these 44 birds. This data is
temporally auto-correlated because it is VHF radio-telemetry data recorded
continually throughout the study.
What is the temporal resolution? Based on a guess at the length
of the breeding season (60 days), I'm guessing about every 10-12 minutes.
This isn't essential information but would help get a feel for the
data.
Be aware that modeling temporal autocorrelation in a binary
variable may be a little bit tricky -- there are a variety of
approaches, but none are quite as easy as the way one builds
autocorrelation into a normal-response model, by making the residuals
within blocks multivariate normal with a specified autocorrelation. A
few possibilities that spring to mind are (1) because there is
presumably no error in the observations themselves, you could
condition on the previous observation (i.e. put it in as a predictor);
(2) aggregate the data to a coarser temporal scale and fit the data as
binomial (e.g. number of presence/absence values per 2-hour period,
or per day, or some other appropriate period that balances resolution
and lack of autocorrelation); (3) if there are long 'runs' of
presence/absences, aggregate the data down to times when birds entered
or left; (4) [fanciest, but not necessarily worth the trouble or most
appropriate] assume an underlying multivariate normal distribution
that *is* autocorrelated and controls probability of presence
(i.e. a hierarchical model with autocorrelation in the level
below the observation level).
I have a binary response variable, "present", with independent variables
"sex" at 2 levels, "homecolony" at 2 levels, "colony" at 5 levels, and
"period" at 3 levels. I would like to incorporate "period", "bird" and
"colony" as random effects. Fixed effects are "sex" and "homecolony". The
"bird" variable is what I want the model blocked by.
Practically speaking, you can't treat period as a random effect --
not enough levels.
I am trying to answer the following questions: "Are there differences in
presence between sexes, homecolony, and period?" "Do birds from one
homecolony differ in their use of other colonies (colony)?"
I cannot figure out how to specify that I want the model to block by
subject, i.e. "bird". I have tried incorporating "bird" as a random effect,
but the degrees of freedom still come out to be close to 300,000 which is
near the total number of observations. Can anyone show me syntax that will
incorporate bird as a random effect that the model blocks by subject?
Where are you seeing the degrees of freedom?
I tried following examples from Doug Bates' LME4 book, Chapter 4:
Glmm_FD <- lmer(present ~ 1 + gender + period + homecolony + colony + (1 +
period|bird), data = FD, family = binomial, REML = 0)
I know that the (1+period|bird) term is probably incorrect.
REML=0 is meaningless in this case (lmer doesn't use REML
or an analogue of it when fitting a non-Gaussian model), but
otherwise your model specification seems to be on the right track.
Since I'm guessing the birds can only be detected at a single
station at a time, this is a bit more of a categorical response
(i.e., is bird X present at colony 1-5 or "none of the above"
at time T?) Have you thought about multistate mark-recapture
models ... ?
This seems like a fairly complex problem. How have others
in your field handled these kinds of data? With lots of data
on each bird, you may be able to simplify your life a bit by
analyzing each bird separately -- i.e. a two-stage model rather
than a mixed/multilevel model -- Murtaugh 2007 _Ecology_ recommends
this, although I don't always agree with him ...
I would recommend Zuur et al for messy/complex ecological
data, although I don't agree with all of that either ...