Temporal correlation structure in mixed model
On 14-03-27 12:23 PM, Worthington, Thomas A wrote:
Dear All
I'm trying to run a mixed model on ecology data. The setup is as
follows, a group of fish are placed in a flume and attempt to pass a barrier. The dependent variable is the number of approaches to pass the barrier per minute of the trial, with each trial lasting 10 minutes. There are two configurations of the barrier. There are 20 trials in total, 10 for each setup. The aim of the study is to see whether the number of approaches in minute 1, minute 2, minute 3 etc. is different between the two setups. Therefore my model is of the form Approaches ~ Setup + Minute + Setup * Minute, where minute (10 levels) and setup (2 levels) are both factors. The dependent variable sounds more like a count than a continuous variable, so you might consider a Poisson GLMM, _but_ it is harder to incorporate temporal autocorrelation in that case, so if it seems reasonable you might try to stick with the linear mixed model (i.e. Gaussian/Normal responses). How big is your data set (is a 'trial' equivalent to a single fish? 10 levels=10 minutes? Does that you mean you have a total of 200 observations?) What is a typical number/range of attempts per minute?
I have three potential models and am slightly confused over the
model specification, I believe I need to include a correlation structure to deal with temporal autocorrelation but am unclear over whether a random effect is needed instead or both are required. I have included a variance structure to deal with the issue of homogeneity of variance in the data. A Poisson GLMM might help here.
The three models are as follows
test1 <- gls(Approaches ~ Setup * Minute, data=data, correlation =
corAR1(form =~ Minute|Trial), weights = varIdent(form=~ 1| Minute * Setup), control=list(opt="optim")) This will estimate correlation, but will assume that the average number of attempts is the same for every trial within a setup category.
test2 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup), control=list(opt="optim")) This looks overspecified (does it really work??) -- it is allowing the differences among every pair of minutes to be characterized for every trial. Even ~1|Trial/Minute is overspecified, since there is only one observation per minute per trial (thus the among-minute-within-trial variance would be confounded with the residual variance).
test3 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup), correlation = corAR1(form =~ Minute|Trial), control=list(opt="optim")) This looks even more overspecified. I would suggest lme(Approaches ~ Setup * Minute, data=data, random = ~ 1 | Trial, weights = varIdent(form=~ 1| Minute * Setup), correlation = corAR1(form =~ Minute|Trial), control=list(opt="optim")) which allows for variation among trials (1|Trial) and autocorrelation within Trial -- the Minute:Trial interaction will show up in the residual variance. This still looks ambitious if you have a total of 200 data points, as the varIdent() specification will estimate variances for all 20 of the minute*setup combinations. * Would log- or square-root transforming your data take care of the heteroscedasticity adequately? * Are you sure you want to treat minute as categorical, estimating a separate response for every minute? Or would a smooth trend (linear, quadratic, or spline) describe the data adequately and more parsimoniously? (This applies to both the mean and variance description)
I believe (maybe wrongly) that Test 1 and 2 would be broadly
similar, but if some could suggest the most sensible model I would greatly appreciate it
Best wishes Tom
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models