Stephanie Avery-Gomm <stephanie.averygomm at ...> writes:
Hello,
I am using *nlme* to do a mixed effect repeated measures ANCOVA, with two
additional fixed factors but a limited sample size. *I am seeking
clarification on how to/if I should adjust the inflated degrees of freedom
for a within-subject factor as a way of dealing with the temporal
pseudoreplication. *I am not using lmer so I am not sure the FAQ or Bates
discussion re: adjusting df in lme4/lmer applies (
https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html).*
*More information: At 3 Sites on a river I measured fish Population in
approximately 9 stream Channel Units. Each Channel Unit was classified as a
Habitat, with three levels (Glide, Riffle, Pool). I sampled each Channel
Unit 3 times over the course of the summer, each time taking a Discharge
Measurement (thus the exact Discharge differs a little from Site to Site,
and so is a continuous variable). I want to know if fish Population in
stream habitats (Glides, Riffles, Pools) changes as discharge decreases
over the summer, if there an interaction and if fish Populations differ
between habitats or between sites?
The model I have settled on looks like this:
Pop.Model<-lme(Pop~Site+Habitat*Discharge, random=~1|ChannelUnit,
correlation=corCAR1(),data=mydata)
Inclusion of the three repeated measurements of Population in each Channel
Unit results in temporal pseudoreplication *and the degrees of freedom for
the within-subjects factor (Discharge) is 42, but I only have 26 Channel
Units, so this is obviously inflated (should be 21). I read in The R Book
(Crawley: *(Pg. 644*)
that I can fix this by specifying the degrees of freedom. But how?*
*
Although I?ve read a ton online, including Bates info re: SAS PROC Mixed
versus R lmer and degrees of freedom I find that I am still quite confused.
If anyone can offer specific advice on how I can adjust my degrees of
freedom for the within-subjects factor in nlme or explain in accessible
terms why I don?t need to, I would be very grateful. *
Just in case I haven't provided enough information, here is my data and r
code.
.csv file:
https://www.dropbox.com/s/2ijgq74di3hmo8i/R.Help.csv
.R file:
https://www.dropbox.com/s/puj5maifxc2rfcg/R%20Help.R
.doc with code & diagram:
https://www.dropbox.com/s/29dtofc62t957co/R%20Help.doc
Sincerely,
Stephanie Avery-Gomm
MSc. Candidate, Zoology Department
University of British Columbia
Are you sure that 42 (which is a propitious number in any case, see
_The Hitchhiker's Guide to the Galaxy_) is *not* the right number of
df for Discharge? Continuous predictors often behave differently from
discrete ones: in particular, see the discussion at http://tinyurl.com/ntygq3
(referenced from http://glmm.wikidot.com/faq) about how lme computes
degrees of freedom: "a term is _outer_ to a grouping factor if its
value does not change within levels of the grouping factor", thus if
Discharge takes on different values within each Channel Unit then it is
estimated at the innermost level.
Crawley doesn't actually say (AFAICT) that you ought to be manually
adjusting the df provided by lme: "You use all of the data in the
model, and you specify its structure appropriately so that the
hypotheses are tested with the correct degrees of freedom (10 in this
case, not 48)". For the case he is examining, he is using an
interaction between the continuous predictor (week) and the grouping
factor (plant), *and* the weeks measured are the same for each plant.
I won't say that lme *always* gets the df 'right', but I don't think
I've ever seen a case where there was an unambiguous right answer
(i.e. the situation matched a classical experimental design so that
the problem could also be expressed as a standard method-of-moments
ANOVA with a well defined denominator df) *and* lme got it wrong.
I would suggest: (a) trying out a variety of examples (cross {discrete
predictors, continuous predictors with identical values within each group,
continuous with different values in each group} with {random intercept
only, random intercept + random slope}); (b) looking in an alternative
source such as Ellison and Gotelli's _Primer of Ecological Statistics_
to try to convince yourself about the appropriate df.
Two more issues:
* if the qualitative and quantitative structure of your data
allow it, you should consider adding interactions of Discharge
with random (Channel Unit) and fixed effects (Site)
in your model (see Schielzeth and Forstmeier 2009).
* Another minor can of worms is that one might consider adjusting
the 'denominator df' for the autoregressive structure -- if the
points are not all independent, then the effective df will be
slightly smaller. In principle one can do this with Satterthwaite
or Kenward-Roger approximations, but I don't know if anyone's
implemented them for lme models (pbkrtest implements them for
lme4 models, but those don't allow temporal autocorrelation
structures. Have you looked at the ACF() output to see if
the temporal correlation structure is really necessary for
your data?) However, I would be tempted to sweep this
under the rug (as Crawley seems to; he doesn't mention df
again when discussing autocorrelation structures).
(I will also point out that is is **not** kosher in my opinion to
post a public link to the entirety of a copyrighted (and non-open)
work; it would be fair use, I think, to post a copy of a relevant
page or two, or to point to it on Google Books
<http://books.google.com/books?id=8D4HVx0apZQC&pg=PA644>.)
@article{schielzeth_conclusions_2009,
title = {Conclusions beyond support: overconfident estimates in mixed models},
volume = {20},
number = {2},
journal = {Behavioral Ecology},
author = {Schielzeth, Holger and Forstmeier, Wolfgang},
month = mar,
year = {2009},
issn = {1045-2249, 1465-7279},
shorttitle = {Conclusions beyond support},
url = {http://beheco.oxfordjournals.org/content/20/2/416},
doi = {10.1093/beheco/arn145},
pages = {416--420},
}