Skip to content

Troublesome example of lmer fitting an overidentified model; Stata bad too.

2 messages · Paul Johnson, Douglas Bates

#
I'm teaching mixed models, working through a lot of homework problems
in the 2 volume set Multilevel and Longitudinal Modeling using Stata,
by S Rave-Hesketh and A. Skrondal. If you did not look at it yet, I
expect you'll find lots of useful, interesting exercises.

Their Exercise 4.4 is about yield from 10 wheat varieties.  The variables are:

"plot"     "variety"  "yield"    "moist"

variety is the grouping variable in the exercise, but it comes in as a
floating point number. I thought that was the root of the surprise
described here, but now I don't think so.

I was reading one student's homework in Stata and I was
stunned/surprised that Stata's "mixed" function would fit a random
effects model that included the grouping variable as a fixed effect
predictor as well.  This appeared to be grossly overidentified to me.

Here's the stata input and output,

. mixed yield c.moist##i.variety || variety:moist, mle stddev

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -42.123671
Iteration 1:   log likelihood = -41.540417
Iteration 2:   log likelihood = -41.520012
Iteration 3:   log likelihood = -41.520012

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =         60
Group variable: variety                         Number of groups  =         10

                                                Obs per group:
                                                              min =          6
                                                              avg =        6.0
                                                              max =          6

                                                Wald chi2(19)     =   37795.42
Log likelihood = -41.520012                     Prob > chi2       =     0.0000

---------------------------------------------------------------------------------
          yield |      Coef.   Std. Err.      z    P>|z|     [95%
Conf. Interval]
----------------+----------------------------------------------------------------
          moist |   .6077128   .0124644    48.76   0.000      .583283
  .6321425
                |
        variety |
             2  |  -2.844581   .8429087    -3.37   0.001    -4.496652
 -1.192511
             3  |  -1.871277   .7903544    -2.37   0.018    -3.420343
 -.3222105
             4  |  -.3432833   .7323671    -0.47   0.639    -1.778696
   1.09213
             5  |   .2294764   1.121912     0.20   0.838     -1.96943
  2.428383
             6  |    3.46207    .690504     5.01   0.000     2.108707
  4.815433
             7  |  -12.05622   .6729943   -17.91   0.000    -13.37527
 -10.73718
             8  |   1.175528   .7302294     1.61   0.107    -.2556957
  2.606751
             9  |  -1.434985   .7917805    -1.81   0.070    -2.986846
  .1168767
            10  |   3.494313   1.392236     2.51   0.012     .7655807
  6.223044
                |
variety#c.moist |
             2  |  -.0354326   .0260328    -1.36   0.173    -.0864559
  .0155908
             3  |   .1297872   .0190785     6.80   0.000     .0923941
  .1671804
             4  |   .0264085   .0210603     1.25   0.210    -.0148689
   .067686
             5  |   .0284318   .0240342     1.18   0.237    -.0186744
   .075538
             6  |   .0790988   .0161154     4.91   0.000     .0475132
  .1106844
             7  |   .1164752   .0189384     6.15   0.000     .0793566
  .1535938
             8  |   .0791545   .0188487     4.20   0.000     .0422118
  .1160972
             9  |    .080266   .0190236     4.22   0.000     .0429804
  .1175516
            10  |   .0027604   .0282729     0.10   0.922    -.0526535
  .0581742
                |
          _cons |   34.58378   .5478187    63.13   0.000     33.51007
  35.65748
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
variety: Independent         |
                   sd(moist) |   3.46e-15   2.27e-14      8.96e-21    1.34e-09
                   sd(_cons) |   1.18e-11   1.34e-07             0           .
-----------------------------+------------------------------------------------
                sd(Residual) |   .4833867   .0441285      .4041925    .5780976
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 0.00                  Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.


As you can see, Stata deals with the overidentification by reporting
miniscule estimates for the random effects.

I think its a bug, wondered what lme4 would do. I predicted it would
refuse to fit at all, as soon as we declare variety as a factor
(varietyf).  However, that does not happen. The full R runable example
is below, but here's the bad part:
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ moist * varietyf + (moist | varietyf)
   Data: dat

REML criterion at convergence: 157.7

Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.74078 -0.52638 -0.00039  0.56385  1.79231

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 varietyf (Intercept) 0.3505   0.592
          moist       0.3505   0.592    0.00
 Residual             0.3505   0.592
Number of obs: 60, groups:  varietyf, 10

Fixed effects:
                  Estimate Std. Error t value
(Intercept)       34.58378    0.89479   38.65
moist              0.60771    0.59222    1.03
varietyf2         -2.84458    1.32918   -2.14
varietyf3         -1.87128    1.27984   -1.46
varietyf4         -0.34328    1.22700   -0.28
varietyf5          0.22948    1.60904    0.14
varietyf6          3.46207    1.19003    2.91
varietyf7        -12.05622    1.17489  -10.26
varietyf8          1.17553    1.22509    0.96
varietyf9         -1.43498    1.28116   -1.12
varietyf10         3.49431    1.89960    1.84
moist:varietyf2   -0.03543    0.83786   -0.04
moist:varietyf3    0.12979    0.83758    0.15
moist:varietyf4    0.02641    0.83765    0.03
moist:varietyf5    0.02843    0.83777    0.03
moist:varietyf6    0.07910    0.83748    0.09
moist:varietyf7    0.11647    0.83757    0.14
moist:varietyf8    0.07915    0.83757    0.09
moist:varietyf9    0.08027    0.83757    0.10
moist:varietyf10   0.00276    0.83797    0.00




Here' s my full R


## Paul Johnson <pauljohn at ku.edu>
## 20160224
library(foreign)
dat <- read.dta("http://www.stata-press.com/data/mlmus3/wheat.dta")
## write.dta(dat, file = "whead.dta12")
## dat <- read.dta("wheat.dta12")
summary(dat)
## Variety should be an integer, so should plot, but we don't use it
dat$varietyf <- as.factor(dat$variety)

library(lme4)

## This is obviously overidentified/incoherent.
## lmer should bounce the user out.
m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
summary(m.wrong)

## Continue with the homework
m1 <-  lmer(yield ~ moist + (1|varietyf), data = dat)

m2 <- lmer(yield ~ moist + (moist|varietyf), data = dat)
library(lattice)
dotplot(ranef(m2, condVar = TRUE))

## Here's a shocker
anova(m2, m1)
## Really? Look at the pictures!

## Here's a spaghetti plot.
plot(m2)
m2coef <- coef(m2)[["varietyf"]]
m2coef$variety <- rownames(m2coef)
plot(yield ~ moist, data = dat, col = dat$variety)
apply(m2coef, 1, function(x){ browser();  abline(a = x[1], b = x[2],
col = x["variety"])})

## If Var(e) is small enough, even trivial slope differences are
## "statistically significant".
#
I agree that the model is overspecified.  What is happening in the two fits
is that the likelihood surface is flat, up to round-off, in certain
directions so the optimizer converges close to the starting estimates.
Stata is using an EM algorithm and after one step the fixed-effects terms
involving variety will have taken out the variation due to variety.  That's
why the estimates of the variances of the random effects from Stata are
close to zero.

In lmer the starting estimates for the covariance parameters correspond to
a relative covariance factor of the identity.   This, in turn, corresponds
to a covariance matrix for the random effects which is sigma * I.  Notice
that estimates of the residual standard error and both standard errors of
the random effects have converged to 0.592

Detecting this situation is not trivial.  You can do it symbolically but
that involves a lot of symbolic analysis to catch all the possibilities.
You can try to do it numerically by comparing columns in the fixed-effects
model matrix corresponding to model terms and those in the random-effects
model matrix but any such numeric procedure must involve a tolerance and it
is not clear how to set that.  Anyone who has taken a linear algebra course
knows that the rank of a matrix is well defined.  In practice, using
floating-point arithmetic, reliably determining the rank of a matrix is a
notoriously difficult problem - probably an unsolveable problem.

Writing code to detect ill-posed models or failure to converge or other
problematic conditions is a game of whack-a-mole.  You have to guess in
what way the model will be ill-posed then write code to detect this, etc.
This leads to code bloat.  Worse it slows things down, takes up memory,
etc. for all model fits and you need to contend with false positives, which
has been an ongoing issue with the convergence checks in lme4.

Eventually it comes down to the extent to which the developers of the code
feel the need to protect users from themselves. Ben is much more inclined
to do this that I am.  I take the approach of telling the user "don't do
that".  Of course, that doesn't help the next user who tries to do the same
thing.
On Wed, Feb 24, 2016 at 11:18 AM Paul Johnson <pauljohn32 at gmail.com> wrote: