Troublesome example of lmer fitting an overidentified model; Stata bad too.
I agree that the model is overspecified. What is happening in the two fits is that the likelihood surface is flat, up to round-off, in certain directions so the optimizer converges close to the starting estimates. Stata is using an EM algorithm and after one step the fixed-effects terms involving variety will have taken out the variation due to variety. That's why the estimates of the variances of the random effects from Stata are close to zero. In lmer the starting estimates for the covariance parameters correspond to a relative covariance factor of the identity. This, in turn, corresponds to a covariance matrix for the random effects which is sigma * I. Notice that estimates of the residual standard error and both standard errors of the random effects have converged to 0.592 Detecting this situation is not trivial. You can do it symbolically but that involves a lot of symbolic analysis to catch all the possibilities. You can try to do it numerically by comparing columns in the fixed-effects model matrix corresponding to model terms and those in the random-effects model matrix but any such numeric procedure must involve a tolerance and it is not clear how to set that. Anyone who has taken a linear algebra course knows that the rank of a matrix is well defined. In practice, using floating-point arithmetic, reliably determining the rank of a matrix is a notoriously difficult problem - probably an unsolveable problem. Writing code to detect ill-posed models or failure to converge or other problematic conditions is a game of whack-a-mole. You have to guess in what way the model will be ill-posed then write code to detect this, etc. This leads to code bloat. Worse it slows things down, takes up memory, etc. for all model fits and you need to contend with false positives, which has been an ongoing issue with the convergence checks in lme4. Eventually it comes down to the extent to which the developers of the code feel the need to protect users from themselves. Ben is much more inclined to do this that I am. I take the approach of telling the user "don't do that". Of course, that doesn't help the next user who tries to do the same thing.
On Wed, Feb 24, 2016 at 11:18 AM Paul Johnson <pauljohn32 at gmail.com> wrote:
I'm teaching mixed models, working through a lot of homework problems
in the 2 volume set Multilevel and Longitudinal Modeling using Stata,
by S Rave-Hesketh and A. Skrondal. If you did not look at it yet, I
expect you'll find lots of useful, interesting exercises.
Their Exercise 4.4 is about yield from 10 wheat varieties. The variables
are:
"plot" "variety" "yield" "moist"
variety is the grouping variable in the exercise, but it comes in as a
floating point number. I thought that was the root of the surprise
described here, but now I don't think so.
I was reading one student's homework in Stata and I was
stunned/surprised that Stata's "mixed" function would fit a random
effects model that included the grouping variable as a fixed effect
predictor as well. This appeared to be grossly overidentified to me.
Here's the stata input and output,
. mixed yield c.moist##i.variety || variety:moist, mle stddev
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -42.123671
Iteration 1: log likelihood = -41.540417
Iteration 2: log likelihood = -41.520012
Iteration 3: log likelihood = -41.520012
Computing standard errors:
Mixed-effects ML regression Number of obs =
60
Group variable: variety Number of groups =
10
Obs per group:
min =
6
avg =
6.0
max =
6
Wald chi2(19) =
37795.42
Log likelihood = -41.520012 Prob > chi2 =
0.0000
---------------------------------------------------------------------------------
yield | Coef. Std. Err. z P>|z| [95%
Conf. Interval]
----------------+----------------------------------------------------------------
moist | .6077128 .0124644 48.76 0.000 .583283
.6321425
|
variety |
2 | -2.844581 .8429087 -3.37 0.001 -4.496652
-1.192511
3 | -1.871277 .7903544 -2.37 0.018 -3.420343
-.3222105
4 | -.3432833 .7323671 -0.47 0.639 -1.778696
1.09213
5 | .2294764 1.121912 0.20 0.838 -1.96943
2.428383
6 | 3.46207 .690504 5.01 0.000 2.108707
4.815433
7 | -12.05622 .6729943 -17.91 0.000 -13.37527
-10.73718
8 | 1.175528 .7302294 1.61 0.107 -.2556957
2.606751
9 | -1.434985 .7917805 -1.81 0.070 -2.986846
.1168767
10 | 3.494313 1.392236 2.51 0.012 .7655807
6.223044
|
variety#c.moist |
2 | -.0354326 .0260328 -1.36 0.173 -.0864559
.0155908
3 | .1297872 .0190785 6.80 0.000 .0923941
.1671804
4 | .0264085 .0210603 1.25 0.210 -.0148689
.067686
5 | .0284318 .0240342 1.18 0.237 -.0186744
.075538
6 | .0790988 .0161154 4.91 0.000 .0475132
.1106844
7 | .1164752 .0189384 6.15 0.000 .0793566
.1535938
8 | .0791545 .0188487 4.20 0.000 .0422118
.1160972
9 | .080266 .0190236 4.22 0.000 .0429804
.1175516
10 | .0027604 .0282729 0.10 0.922 -.0526535
.0581742
|
_cons | 34.58378 .5478187 63.13 0.000 33.51007
35.65748
---------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf.
Interval]
-----------------------------+------------------------------------------------
variety: Independent |
sd(moist) | 3.46e-15 2.27e-14 8.96e-21
1.34e-09
sd(_cons) | 1.18e-11 1.34e-07 0
.
-----------------------------+------------------------------------------------
sd(Residual) | .4833867 .0441285 .4041925
.5780976
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 0.00 Prob > chi2 =
1.0000
Note: LR test is conservative and provided only for reference.
As you can see, Stata deals with the overidentification by reporting
miniscule estimates for the random effects.
I think its a bug, wondered what lme4 would do. I predicted it would
refuse to fit at all, as soon as we declare variety as a factor
(varietyf). However, that does not happen. The full R runable example
is below, but here's the bad part:
m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat) summary(m.wrong)
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ moist * varietyf + (moist | varietyf)
Data: dat
REML criterion at convergence: 157.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.74078 -0.52638 -0.00039 0.56385 1.79231
Random effects:
Groups Name Variance Std.Dev. Corr
varietyf (Intercept) 0.3505 0.592
moist 0.3505 0.592 0.00
Residual 0.3505 0.592
Number of obs: 60, groups: varietyf, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 34.58378 0.89479 38.65
moist 0.60771 0.59222 1.03
varietyf2 -2.84458 1.32918 -2.14
varietyf3 -1.87128 1.27984 -1.46
varietyf4 -0.34328 1.22700 -0.28
varietyf5 0.22948 1.60904 0.14
varietyf6 3.46207 1.19003 2.91
varietyf7 -12.05622 1.17489 -10.26
varietyf8 1.17553 1.22509 0.96
varietyf9 -1.43498 1.28116 -1.12
varietyf10 3.49431 1.89960 1.84
moist:varietyf2 -0.03543 0.83786 -0.04
moist:varietyf3 0.12979 0.83758 0.15
moist:varietyf4 0.02641 0.83765 0.03
moist:varietyf5 0.02843 0.83777 0.03
moist:varietyf6 0.07910 0.83748 0.09
moist:varietyf7 0.11647 0.83757 0.14
moist:varietyf8 0.07915 0.83757 0.09
moist:varietyf9 0.08027 0.83757 0.10
moist:varietyf10 0.00276 0.83797 0.00
Here' s my full R
## Paul Johnson <pauljohn at ku.edu>
## 20160224
library(foreign)
dat <- read.dta("http://www.stata-press.com/data/mlmus3/wheat.dta")
## write.dta(dat, file = "whead.dta12")
## dat <- read.dta("wheat.dta12")
summary(dat)
## Variety should be an integer, so should plot, but we don't use it
dat$varietyf <- as.factor(dat$variety)
library(lme4)
## This is obviously overidentified/incoherent.
## lmer should bounce the user out.
m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
summary(m.wrong)
## Continue with the homework
m1 <- lmer(yield ~ moist + (1|varietyf), data = dat)
m2 <- lmer(yield ~ moist + (moist|varietyf), data = dat)
library(lattice)
dotplot(ranef(m2, condVar = TRUE))
## Here's a shocker
anova(m2, m1)
## Really? Look at the pictures!
## Here's a spaghetti plot.
plot(m2)
m2coef <- coef(m2)[["varietyf"]]
m2coef$variety <- rownames(m2coef)
plot(yield ~ moist, data = dat, col = dat$variety)
apply(m2coef, 1, function(x){ browser(); abline(a = x[1], b = x[2],
col = x["variety"])})
## If Var(e) is small enough, even trivial slope differences are
## "statistically significant".
--
Paul E. Johnson
Professor, Political Science Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://crmda.ku.edu
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models