Troublesome example of lmer fitting an overidentified model; Stata bad too.

I agree that the model is overspecified.  What is happening in the two fits
is that the likelihood surface is flat, up to round-off, in certain
directions so the optimizer converges close to the starting estimates.
Stata is using an EM algorithm and after one step the fixed-effects terms
involving variety will have taken out the variation due to variety.  That's
why the estimates of the variances of the random effects from Stata are
close to zero.

In lmer the starting estimates for the covariance parameters correspond to
a relative covariance factor of the identity.   This, in turn, corresponds
to a covariance matrix for the random effects which is sigma * I.  Notice
that estimates of the residual standard error and both standard errors of
the random effects have converged to 0.592

Detecting this situation is not trivial.  You can do it symbolically but
that involves a lot of symbolic analysis to catch all the possibilities.
You can try to do it numerically by comparing columns in the fixed-effects
model matrix corresponding to model terms and those in the random-effects
model matrix but any such numeric procedure must involve a tolerance and it
is not clear how to set that.  Anyone who has taken a linear algebra course
knows that the rank of a matrix is well defined.  In practice, using
floating-point arithmetic, reliably determining the rank of a matrix is a
notoriously difficult problem - probably an unsolveable problem.

Writing code to detect ill-posed models or failure to converge or other
problematic conditions is a game of whack-a-mole.  You have to guess in
what way the model will be ill-posed then write code to detect this, etc.
This leads to code bloat.  Worse it slows things down, takes up memory,
etc. for all model fits and you need to contend with false positives, which
has been an ongoing issue with the convergence checks in lme4.

Eventually it comes down to the extent to which the developers of the code
feel the need to protect users from themselves. Ben is much more inclined
to do this that I am.  I take the approach of telling the user "don't do
that".  Of course, that doesn't help the next user who tries to do the same
thing.

Troublesome example of lmer fitting an overidentified model; Stata bad too.

Thread (2 messages)