anova function gives (irrelevant) results on two models with different numbers of observations

On Sat, Apr 12, 2008 at 11:29 AM, Olivier Renaud
Hi,
 I have encountered what I believe is a bug or an extremely unwanted
 "feature" in lme4. In short, the anova function does not check that the
 two models are based on the same number of observations and can thus
 give irrelevant results, as exemplified below. Note that lme (at least
 on S+) correctly refuses to compare the models.
Perhaps you would like to contribute the code that performs the
appropriate checks on the models before creating the analysis of
variance table. Check out a copy of the lme4 package sources from the
R-forge repository and search for

setMethod("anova"

in the file pkg/R/lmer.R  You will see that there are already checks
in that method for the models being fit to the same data argument (if
used).  Patches are welcome.
 I found this problem, because I wanted to understand why the LRT gave
 extremely different value than the Wald/t-stat index/statistics/test
 (choose one ;-) ), see below. It took me a while to realize that the
 number of observations was not the same in the two models (156 vs. 160).
 It happens when an explanatory variable is present in only one of the
 two models and contains NA's, which in not an uncommon situation. The
 MLdeviances then are very different but not comparable. I'm afraid it
 might have unnoticed by other users. I have reproduced the calls in lmer
 (R) and lme (S+).

 Olivier

 R version 2.6.1 (2007-11-26)
 Package:       lme4
 Version:       0.99875-9
  > (Fu.mod0.lmer <- lmer(Y ~ 1 + ( 1|subjec), data = Fu,
 na.action=na.omit, method="ML") )
 Linear mixed-effects model fit by maximum likelihood
 Formula: Y ~ 1 + (1 | subjec)
   Data: Fu
   AIC   BIC logLik MLdeviance REMLdeviance
  654.7 660.9 -325.4      650.7        651.4
 Random effects:
  Groups   Name        Variance Std.Dev.
  subjec   (Intercept) 0.93595  0.96745
  Residual             2.96389  1.72160
 number of obs: 160, groups: subjec, 16

 Fixed effects:
            Estimate Std. Error t value
 (Intercept)   3.4875     0.2775   12.57

  > (Fu.mod1.lmer <- lmer(Y ~ expvariable + ( 1|subjec), data = Fu,
 na.action=na.omit, method="ML") )
 Linear mixed-effects model fit by maximum likelihood
 Formula: Y ~ expvariable + (1 | subjec)
   Data: Fu
   AIC   BIC logLik MLdeviance REMLdeviance
  637.8 646.9 -315.9      631.8        633.1
 Random effects:
  Groups   Name        Variance Std.Dev.
  subjec   (Intercept) 1.0357   1.0177
  Residual             2.8796   1.6969
 number of obs: 156, groups: subjec, 16

 Fixed effects:
            Estimate Std. Error t value
 (Intercept)   3.2673     0.3145   10.39
 expvariable   0.4140     0.2856    1.45

 Correlation of Fixed Effects:
            (Intr)
 expvariable -0.398

  > anova(Fu.mod0.lmer, Fu.mod1.lmer)
 Data: Fu
 Models:
 Fu.mod0.lmer: Y ~ 1 + (1 | subjec)
 Fu.mod1.lmer: Y ~ expvariable + (1 | subjec)
             Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
 Fu.mod0.lmer  2  654.70  660.85 -325.35
 Fu.mod1.lmer  3  637.78  646.93 -315.89 18.917      1  1.365e-05 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 ######## In S+8.0

  > Fu.mod0.lme <- lme(Y ~ 1, random=~1|subjec, data = Fu,
 na.action=na.omit, method="ML")
  > summary(Fu.mod0.lme)
 Linear mixed-effects model fit by maximum likelihood
  Data: Fu
       AIC      BIC    logLik
  656.7007 665.9262 -325.3503

 Random effects:
  Formula:  ~ 1 | subjec
        (Intercept) Residual
 StdDev:   0.9674476 1.721595

 Fixed effects: Y ~ 1
             Value Std.Error  DF  t-value p-value
 (Intercept) 3.4875 0.2783988 144 12.52699  <.0001

 Standardized Within-Group Residuals:
       Min         Q1        Med        Q3      Max
  -2.244476 -0.7371921 -0.2445665 0.8289158 2.608248

 Number of Observations: 160
 Number of Groups: 16
  > Fu.mod1.lme <- lme(Y ~ expvariable, random=~1|subjec, data = Fu,
 na.action=na.omit, method="ML")
  > summary(Fu.mod1.lme)
 Linear mixed-effects model fit by maximum likelihood
  Data: Fu
       AIC     BIC    logLik
  639.7835 651.983 -315.8918

 Random effects:
  Formula:  ~ 1 | subjec
        (Intercept) Residual
 StdDev:    1.017715 1.696945

 Fixed effects: Y ~ expvariable
               Value Std.Error  DF  t-value p-value
 (Intercept) 3.267305 0.3164881 139 10.32363  <.0001
 expvariable 0.413960 0.2874080 139  1.44032   0.152

 Standardized Within-Group Residuals:
     Min         Q1        Med        Q3      Max
  -2.4028 -0.6934754 -0.2075537 0.7346619 2.768138

 Number of Observations: 156
 Number of Groups: 16
  > anova(Fu.mod0.lme, Fu.mod1.lme)
 Problem in anova.lme(Fu.mod0.lme, Fu.mod1.lme): All fitted objects must
 use the same number of observations
 Use traceback() to see the call stack

 --
 Olivier.Renaud at pse.unige.ch             http://www.unige.ch/~renaud/
 Methodology & Data Analysis - Psychology Dept - University of Geneva
 UniMail, Office 4142  -  40, Bd du Pont d'Arve   -  CH-1211 Geneva 4

 _______________________________________________
 R-sig-mixed-models at r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

anova function gives (irrelevant) results on two models with different numbers of observations

Thread (2 messages)