Skip to content

lme() vs aov()

7 messages · Federico Calboli, Peter Dalgaard, Kingsford Jones +2 more

#
Hi All,

I was playing with a small dataset of 12 observations, a very basic  
nested model, with 3 drugs, 2 sources for each drug and two response  
counts for each source (the response is some medical parameter, of no  
real interest here). The data is:

drug source response
d1 a 102
d1 a 104
d1 q 103
d1 q 104
d2 d 108
d2 d 110
d2 b 109
d2 b 108
d3 l 104
d3 l 106
d3 s 105
d3 s 107

For kicks, and because the data is balanced I thought that I could  
use it to compare the results of aov() with those of lme() -- I know  
the library lme4  and lmer() should be preferred, but the stuff I am  
ultimately testig was done with lme.

In any case I fit 2 models and got 2 different answers:

 > mod.lme = lme(response ~ drug, random = ~1|source, dat)
 > mod.aov = aov(response ~ drug + Error(source), dat)

 > summary(mod.aov)

Error: source
           Df Sum Sq Mean Sq F value   Pr(>F)
drug       2 61.167  30.583  61.167 0.003703 **
Residuals  3  1.500   0.500
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6    9.0     1.5


 > anova(mod.lme) # I use anova here to directly compare the F-test
             numDF denDF   F-value p-value
(Intercept)     1     6 115207.14  <.0001
drug            2     3     26.21  0.0126

(incidentally the 3 denDF here make me think the F-test is exactly  
what I'd expect)

Because the results look different, I thought the possibilities are:

1) I fit 2 different models without realising it
2) one model is more conservative than the other
3) I'm completely missing some point (despite searching the archives  
of R-help and R-ME)

Just to be pesky, if I check the calculations against the book I got  
the data from (Zar 4th ed, pgg 304-305) they agree with the aov()  
results.

Any illumination is gratefully asked for. I apologise in advance for  
any annoyance past/present/future my question will cause.

Federico



--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
#
Federico Calboli wrote:
The gut reaction is that you shouldn't trust lme() in low-df cases, but 
in this particular case the issue is different:

 > summary(mod.aov)

Error: source
          Df Sum Sq Mean Sq F value   Pr(>F)  
drug       2 61.167  30.583  61.167 0.003703 **
Residuals  3  1.500   0.500                   
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6    9.0     1.5              

Notice that the Residuals Mean Sq is larger in the Within stratum than 
in the source stratum. In terms of a mixed-effects model, this implies a 
negative estimate for the variance of the source effect. lme() will have 
nothing of that and sets it to zero instead. If you drop the 
Error(source) you get the same F as in lme() although the df differ.

(The  "negative variance" can be interpreted as negative within-source 
correlation, but that only works properly for balanced designs. Long 
story...)
#
Hi Federico,

Note also that the intervals function is a good tool for checking
reliability of lme objects:
Error in intervals.lme(mod.lme) :
  Cannot get confidence intervals on var-cov components: Non-positive
definite approximate variance-covariance


Kingsford





On Thu, May 22, 2008 at 1:19 PM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
#
Hi all,

well spotted, Peter!

Federico, another tool that will help diagnose this problem is
VarCorr, which provides an estimate of the variance components.
source = pdLogChol(1) 
            Variance     StdDev      
(Intercept) 6.675305e-10 2.583661e-05
Residual    1.166667e+00 1.080123e+00



Andrew
On Thu, May 22, 2008 at 01:54:42PM -0700, Kingsford Jones wrote:

  
    
#
Peter Dalgaard wrote:
Thank you Peter for the explanation. I'm perfectly happy about this particular 
model, but I'd like to ask you (and everyone else who'd like to chime in), what 
do you mean with "you shouldn't trust lme() in low-df cases"? Why?

(I ask because I often have low-df analyses to do).

Regards,

Federico
1 day later
#
On Fri, May 23, 2008 at 12:09 PM, Federico Calboli
<f.calboli at imperial.ac.uk> wrote:
mcmcsamp(model) is one option.

Reinhold Kliegl
2 days later
#
Federico Calboli wrote:
Cynics will say that you shouldn't trust low-df analysis, period!
Effectively, the F tests are corrected chisq/f statistics, but the
correction depends crucially on the normal distribution, notably its 3rd
and 4th moment. If the correction is small, you can probably assume that
it will be small in non-normal cases too, apply the correction that is
right for normally distributed data and hope for the best, but if it is
large, then all bets are off.

For instance, looking at tests with 5 and 20 denominator df
[1] 2.604909
[1] 5.409451
[1] 3.098391
[1] 0.1642641
[1] 0.08017612

For unbalanced data, and even sometimes in the balanced case (when
effects "cross" error strata), the "F" statistic is not F distributed,
but there are some approximation methods involving "approximate
denominator df". SAS has three of these (containment, Satterthwaite,
Kenward-Rogers), lme() has only the
containment method, and lmer() has none (it has the asymptotic chisq,
plus the mcsamp() approach, which is still "in the works").

Of the three df correction methods, "containment" is quite easily fooled
into giving unreasonably large df even in balanced cases, SAS's
"Satterthwaite" is based on some rather dubious hacks, and only
Kenward-Rogers appears to be on a reasonably sound theoretical footing.
All in my humble opinion, that is.

It would be nice to have Kenward-Rogers implemented for lme() and
lmer(), but it does not mesh well with the computational technology used
to fit the models, so it is not as easy to do as it may sound. It is in
my opinion a problem, even in the light of the limited usefulness of the
improved approximation. The object is often not to get the p value right
to some number of significant digits but rather to have something that
flags tests as unreliable.