An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120205/32cf47f4/attachment-0001.pl>
lmer blocking by subject?
2 messages · Tiffanie Cross, Ben Bolker
Tiffanie Cross <tahtg6 at ...> writes:
I am new to R and would like some guidance on how to block by subject. I collected presence/absence 24 hrs/day for 44 birds, denoted "bird", at 5 stations, denoted "colony" for the duration of a breeding season. The breeding season was broken into biologically relevant time periods (3 levels), denoted "period". The birds were captured on 2 different colonies, 22 on each colony, denoted "homecolony". I also have the variable, "sex". I have about 300,000 total observations for these 44 birds. This data is temporally auto-correlated because it is VHF radio-telemetry data recorded continually throughout the study.
What is the temporal resolution? Based on a guess at the length of the breeding season (60 days), I'm guessing about every 10-12 minutes. This isn't essential information but would help get a feel for the data. Be aware that modeling temporal autocorrelation in a binary variable may be a little bit tricky -- there are a variety of approaches, but none are quite as easy as the way one builds autocorrelation into a normal-response model, by making the residuals within blocks multivariate normal with a specified autocorrelation. A few possibilities that spring to mind are (1) because there is presumably no error in the observations themselves, you could condition on the previous observation (i.e. put it in as a predictor); (2) aggregate the data to a coarser temporal scale and fit the data as binomial (e.g. number of presence/absence values per 2-hour period, or per day, or some other appropriate period that balances resolution and lack of autocorrelation); (3) if there are long 'runs' of presence/absences, aggregate the data down to times when birds entered or left; (4) [fanciest, but not necessarily worth the trouble or most appropriate] assume an underlying multivariate normal distribution that *is* autocorrelated and controls probability of presence (i.e. a hierarchical model with autocorrelation in the level below the observation level).
I have a binary response variable, "present", with independent variables "sex" at 2 levels, "homecolony" at 2 levels, "colony" at 5 levels, and "period" at 3 levels. I would like to incorporate "period", "bird" and "colony" as random effects. Fixed effects are "sex" and "homecolony". The "bird" variable is what I want the model blocked by.
Practically speaking, you can't treat period as a random effect -- not enough levels.
I am trying to answer the following questions: "Are there differences in presence between sexes, homecolony, and period?" "Do birds from one homecolony differ in their use of other colonies (colony)?" I cannot figure out how to specify that I want the model to block by subject, i.e. "bird". I have tried incorporating "bird" as a random effect, but the degrees of freedom still come out to be close to 300,000 which is near the total number of observations. Can anyone show me syntax that will incorporate bird as a random effect that the model blocks by subject?
Where are you seeing the degrees of freedom?
I tried following examples from Doug Bates' LME4 book, Chapter 4:
Glmm_FD <- lmer(present ~ 1 + gender + period + homecolony + colony + (1 + period|bird), data = FD, family = binomial, REML = 0) I know that the (1+period|bird) term is probably incorrect.
REML=0 is meaningless in this case (lmer doesn't use REML or an analogue of it when fitting a non-Gaussian model), but otherwise your model specification seems to be on the right track. Since I'm guessing the birds can only be detected at a single station at a time, this is a bit more of a categorical response (i.e., is bird X present at colony 1-5 or "none of the above" at time T?) Have you thought about multistate mark-recapture models ... ? This seems like a fairly complex problem. How have others in your field handled these kinds of data? With lots of data on each bird, you may be able to simplify your life a bit by analyzing each bird separately -- i.e. a two-stage model rather than a mixed/multilevel model -- Murtaugh 2007 _Ecology_ recommends this, although I don't always agree with him ... I would recommend Zuur et al for messy/complex ecological data, although I don't agree with all of that either ...