Skip to content

Specifying 'correct' degrees of freedom for within-subject factor in *nlme/lme* repeated measures ANOVA?

5 messages · Stephanie Avery-Gomm, Ben Bolker, Peter Dalgaard +1 more

1 day later
#
Stephanie Avery-Gomm <stephanie.averygomm at ...> writes:
Are you sure that 42 (which is a propitious number in any case, see
_The Hitchhiker's Guide to the Galaxy_) is *not* the right number of
df for Discharge? Continuous predictors often behave differently from 
discrete ones: in particular, see the discussion at http://tinyurl.com/ntygq3
(referenced from http://glmm.wikidot.com/faq) about how lme computes
degrees of freedom: "a term is _outer_ to a grouping factor if its
value does not change within levels of the grouping factor", thus if
Discharge takes on different values within each Channel Unit then it is
estimated at the innermost level.

  Crawley doesn't actually say (AFAICT) that you ought to be manually
adjusting the df provided by lme: "You use all of the data in the
model, and you specify its structure appropriately so that the
hypotheses are tested with the correct degrees of freedom (10 in this
case, not 48)".  For the case he is examining, he is using an
interaction between the continuous predictor (week) and the grouping
factor (plant), *and* the weeks measured are the same for each plant.
I won't say that lme *always* gets the df 'right', but I don't think
I've ever seen a case where there was an unambiguous right answer
(i.e. the situation matched a classical experimental design so that
the problem could also be expressed as a standard method-of-moments
ANOVA with a well defined denominator df) *and* lme got it wrong.

  I would suggest: (a) trying out a variety of examples (cross {discrete
predictors, continuous predictors with identical values within each group,
continuous with different values in each group} with {random intercept
only, random intercept + random slope}); (b) looking in an alternative
source such as Ellison and Gotelli's _Primer of Ecological Statistics_
to try to convince yourself about the appropriate df.

 Two more issues:

 * if the qualitative and quantitative structure of your data
allow it, you should consider adding interactions of Discharge
with random (Channel Unit) and fixed effects (Site) 
in your model (see Schielzeth and Forstmeier 2009).

 * Another minor can of worms is that one might consider adjusting
the 'denominator df' for the autoregressive structure -- if the
points are not all independent, then the effective df will be
slightly smaller.  In principle one can do this with Satterthwaite
or Kenward-Roger approximations, but I don't know if anyone's
implemented them for lme models (pbkrtest implements them for
lme4 models, but those don't allow temporal autocorrelation
structures.  Have you looked at the ACF() output to see if
the temporal correlation structure is really necessary for
your data?)  However, I would be tempted to sweep this
under the rug (as Crawley seems to; he doesn't mention df
again when discussing autocorrelation structures).

(I will also point out that is is **not** kosher in my opinion to
post a public link to the entirety of a copyrighted (and non-open)
work; it would be fair use, I think, to post a copy of a relevant
page or two, or to point to it on Google Books
<http://books.google.com/books?id=8D4HVx0apZQC&pg=PA644>.)

@article{schielzeth_conclusions_2009,
	title = {Conclusions beyond support: overconfident estimates in mixed models},
	volume = {20},
	number = {2},
	journal = {Behavioral Ecology},
	author = {Schielzeth, Holger and Forstmeier, Wolfgang},
	month = mar,
	year = {2009},
	issn = {1045-2249, 1465-7279},
	shorttitle = {Conclusions beyond support},
	url = {http://beheco.oxfordjournals.org/content/20/2/416},
	doi = {10.1093/beheco/arn145},
	pages = {416--420},
}
#
On Jul 28, 2012, at 00:16 , Ben Bolker wrote:

            
Haven't played with lme for a while, but models with crossed random effects is a pretty sure way to get df wrong. E.g., this one which I dug out from some 2006 slides:
Error: Block
          Df   Sum Sq  Mean Sq F value Pr(>F)
Residuals  1 0.008311 0.008311               

Error: Block:sample
          Df  Sum Sq Mean Sq F value  Pr(>F)   
sample     5 0.27615 0.05523   11.21 0.00952 **
Residuals  5 0.02463 0.00493                   
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Error: Block:dilut
          Df Sum Sq Mean Sq F value   Pr(>F)    
dilut      4  3.749  0.9373   420.8 1.68e-05 ***
Residuals  4  0.009  0.0022                     
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Error: Block:sample:dilut
             Df  Sum Sq  Mean Sq F value Pr(>F)
sample:dilut 20 0.05552 0.002776   1.607  0.149
Residuals    20 0.03455 0.001728               

This is not "officially" supported by lme, but you can cheat it to fit the model:
+            random=list(Block=~1,
+                      Block=pdIdent(~sample-1),
+                      dilut=~1))
numDF denDF  F-value p-value
(Intercept)      1    25 538.0174  <.0001
sample           5    25  11.2133  <.0001
dilut            4     4 420.7911  <.0001
sample:dilut    20    25   1.6069  0.1301

but as you see, the F-values are right but the denDF are not.
#
Hello -
The proper degrees of freedom will depend on what error you specify.
I'd expect there is a between Channels component of error that affects
the slope estimates.  In that case your error term maybe should be
random=~Discharge|ChannelUnit

One can of course check whether the estimate of the relevant component 
of variance is greater than 0, and if so whether it is of any consequence.

In such analyses, it is commonly assumed that the variation between
slopes can be entirely explained by variation about a line whose clop
is constant (here, across ChannelUnits).  Experience with previous
such data may establish whether this is a reasonable assumption.
A cautious analyst will, if the data allow it, want to check this assumption.

The degrees of freedom are of consequence only when you want to
move from F-statistics or t-statistics or other such statistics to p-values.
They are part of a mechanism that is used to get approximate p-values.
The p-values are, in general, not well-defined -- assumptions are made
along the lines of the assumptions that underpin the Behrens-Fisher
test.  If the model is balanced they can in general, most pundits will I
think argue,  be used as a reasonable guide.  If the model is badly 
unbalanced, they should be treated with caution.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 28/07/2012, at 8:16 AM, Ben Bolker wrote:

            
#
Hello -
I found it remarkably easy to dowload your data and reproduce your analysis.
Your efforts at making what you'd done thus easy to reproduce are to be
commended!

Compare your
Linear mixed-effects model fit by REML
 Data: mydata 
       AIC      BIC    logLik
  640.1269 661.5583 -310.0635

Random effects:
 Formula: ~1 | ChannelUnit
        (Intercept) Residual
StdDev:    13.87812  29.8971
. . .

with:
Linear mixed-effects model fit by REML
 Data: mydata 
       AIC      BIC   logLik
  636.0179 661.7355 -306.009

Random effects:
 Formula: ~Discharge | ChannelUnit
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept)  17.88118 (Intr)
Discharge   450.69768 -0.663
Residual     23.51211       
. . .

I conclude that there is a very large random slope effect,   The df
for testing the fixed effect (slope) of Discharge should then be 21,
not 42 as the nlme output suggests.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/07/2012, at 5:55 AM, Stephanie Avery-Gomm wrote: