Fruits are counted over time (14 occasions; every 3-5 days) to
evaluate consumption by migratory birds. The fruits are clustered
within one of two subplots (a treatment and a control) assigned
randomly within a larger plot (12 plots in total). Individual
fruits were not marked, but rather the number of fruits consumed
during a given interval was noted. Thus, the number of fruits
present at the start of a time interval varies among subplots and
over time. I'm interested in how consumption varies with treatment
(at the subplot level), plot geography (e.g., north v south), and
count interval on the hazard/likelihood of consumption. Modeling
time as a categorical variable makes more sense biologically (i.e.,
modelling consumption separately for each count period), as
consumption over time is not likely to change linearly or
quadratically.
(or in some not quite as simple but still smooth/deterministic
way that could be modeled by a spline curve ... ?)
Like the thought, but even splines are not likely to approximate patterns of consumption well in this system...it's that sporadic.
I can envision two possible specifications for the
response variable: (1) model consumption using a time-to-event type
binary response in which each individual fruit provides up to 14 rows of
data, with consumption equaling 0 for each count period until
consumption (1) occurs (fruits can not contribute in count periods after
they have been consumed); this equates to what Singer and Willet
(Applied Longitudinal Data Analysis, 2003) call a discrete-time hazard
model, if that's at all familiar; or (2) ignore individual consumption
histories and aggregate fruits within subplots and model the proportion
of fruits consumed in each count interval.
I lack the expertise
to decide which is most appropriate. Moreover, I'm stumped as to how to
specify the clustering and nesting in glmer, regardless of the approach
taken.
For (1), the following comes to mind:
consumption.survival
<- glmer (consumption ~ treatment + geography + count + (1|plot) +
(1|treatment:plot), family = binomial, ...)
For (2):
consumption.binomial
<- glmer (cbind(consumed, total) ~ treatment + geography + count +
(1|plot) + (1|treatment:plot), family = binomial, ...)
I would say that #2 is much more natural.
Not to mention computationally less taxing on my computer.? Have I specified the random effects correctly to account for the nesting present in the design?? Additionally, is it fair to explore interactions among factors (e.g., geography|treatment, treatment|count)?
* glmer is expecting cbind(consumed,total-consumed) rather than
cbin(consumed,total)
Noted.? This will also account for the changing "availability" of fruit during a count period over time, yet consider, for example, 2 of 4 fruits consumed equally to, say, 100 of 200 fruits consumed, correct?
* I think you might additionally want to make 'count'
a random effect; make sure it is a factor (unless you are
really interested in the specific effects at particular
time periods and you don't mind using up all those degrees
of freedom estimating them ...
I am interested in estimating the probability of consumption for each count period, as I'd like to relate these estimates to other factor that influence the abundance of birds on the study site, but perhaps I can do this with the estimated random effects just the same?? However, I don't necessarily need these estimates to be adjusted for other factors (e.g., treatment, geography), but rather marginal means for each count period??
* You don't seem to have accounted for the inter-sample
period (you could do a bit of exploratory data analysis,
or residual analysis, to see if the difference between
3 and 5 days seems to matter). You might try family=binomial(link="cloglog")
with an offset equal to log(Dt), which will make the probability
of consumption within Dt days equal to 1-(1-exp(-Dt*p1)), where
p1 is the per-day consumption probability.
I included Dt as a covariate in some earlier messing, but I will explore the idea of using it as an offset.?
* You may want to a random effect for individual observations (i.e.
mydata$obs <- seq(nrow(mydata))). #2 seems counterintuitive,
but it allows for overdispersion (which can be driven either by
non-independence within samples, or by heterogeneity within
samples).
I saw this in the Browne et al. (2005, JRoyStatSoc A, 168, 599-613) your referenced in an older thread, and thought it worthwhile to pursue.? In this case, each observation would be the cbind(consumed, total-consumed) for each treatment by plot by count, correct?? Thus, if there were 10 plots, 2 treatments, and 10 count periods, there would be 200 observations?
I'm not sure either approach is perfect, but the binomial
approach seems better here.