Missing values in lmer vs. HLM
My apologies for making such a statement then not following up. As has been mentioned, this is a holiday weekend in the U.S. The section that Landon quoted does get at the point of my comment. The usual justification for REML is that REML estimators of variance components are less biased than are the maximum likelihood estimators (mle). On the surface this seems to be a convincing argument, for who would want to use a "biased" estimator? But why should we be concerned with the estimator of the variance? Why not the estimator of the standard deviation, or the logarithm of the standard deviation? The distribution of variance estimators are highly skewed in most cases. Consider the simplest case of estimating the variance from an i.i.d. sample from a Gaussian distribution. The distribution of the estimator is a Chi-squared distribution, which is highly skewed. The distribution of the estimator of ? is less skewed. The distribution of the estimator of log(?) is more-or-less symmetric. The important point here is that "bias" relates to the expected value of the estimator. The argument for REML is based on the expected value of a quantity with a highly skewed distribution, but we know that this is a poor measure of location for such a distribution. That's why it is more informative to consider median salaries instead of average salaries. The fact that the average wealth of members of LeBron James's high school basketball team is very high doesn't make them all rich. Mle's have an invariance property in that the mle of ? is the square root of the mle of ??; the mle of log(??) is the logarithm of the mle of ??, etc. Unbiased estimators aren't invariant under transformation. The square root of an unbiased estimator of ?? is not an unbiased estimator of ?. If an unbiased estimator were so important then we should probably consider the estimate of log(??), not ?? itself. The reason for our being fixated on ?? is more computational than practical. When using hand calculations it is easiest to estimate ?? then derive an estimate of ? from that. These considerations are less convincing when using computers. In summary, the case for REML is less convincing than it seems at first glance. It is a consequence of a certain type of mathematical exposition, where your assumptions are never questioned. You only care about going from "if" to "then". In mathematical statistics you say, "assuming that the model is correct, these are the consequences" and that is all there is to it. The way that the game is actually played is that, when you get to the end of the proof and discover that you need some conditions to make it work, you go back to the beginning and add those conditions. It helps if you call this case the "regular" case or the "normal" case or some other word with favorable connotations. So if you want to characterize the "best" estimator you do it by peeling off properties related to the first moment, the second moment, etc. For the first moment you say that the expected value of the estimator must be equal to the parameter being estimated and you call that the "unbiased" case. Technically this is just a mathematical property but the connotation of the word gives it much more heft than the mathematical property. In mathematical statistics it is irrelevant to question why it is this particular estimator or this particular scale that is of interest - the only objective is to prove the theorem and publish the result. (The folklore in our department is that George Box's famous statement about "all models are wrong" originated in a thesis defense where the candidate began by stating that "Assuming that the model is correct" and George interrupted to say "But all models are wrong". It wasn't a good day for the candidate. I'm sorry to say that I don't know if this story is accurate as I never took the opportunity to ask him.)
On Sat, Jul 4, 2015 at 11:36 PM landon hurley <ljrhurley at gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07/05/2015 12:14 AM, Phillip Alday wrote:
On Sat, 2015-07-04 at 21:21 +0200, Karl Ove Hufthammer wrote:
Den 04. juli 2015 18:18, Douglas Bates skreiv:
Having said all this I will admit that the original sin, the "REML" criterion, was committed by statisticians. In retrospect I wish that we had not incorporated that criterion into the nlme and lme4 packages but, at the time we wrote them, our work would have been dismissed as wrong if our answers did not agree with SAS PROC MIXED, etc. So we opted for bug-for-bug compatibility with existing software.
Hm. I find this statement surprising. I was under the impression REML is *preferred* to ML in many situations (e.g. in simple random intercept models with few observations for each random intercept), and that *ML estimation* may result in severe bias. Do you consider maximising the REML criterion as a bug?
This was my question as well. My understanding was that REML, like Bessel's correction for the sample variance, was motivated by bias in the maximum-likelihood estimator for small numbers of observations. The corrected estimator is in both cases no longer the MLE, so that the ML part is bit of a misnomer, but if you take "residualized" expansion of RE instead of "restricted", then REML seems more like a function of ML and not a "type" of ML. IIRC, the default in MixedModels.jl is now ML -- have you changed your opinion about the utility of REML? Is there some type of weird paradoxical situation with REML like with Bessel's correction -- the variance estimates are no longer biased, but the s.d. estimates are? Or is the original sin the use of the name REML when REML is no longer *the* maximum likelihood?
I had assumed that he would have responded by now, but it is a holiday in the US. The position Bates is taking is explained (I think) in his 2010 report lme4: Mixed effects modelling with R in Section 5.5 `The REML Criterion', roughly page 123-124 in the pdf [0]. It's a short read, but the most relevant bit I think is:
The argument for preferring ?_R to ?_L as an estimate of ?**2 is
that the numerator in both estimates is the sum of squared
residuals at ? and, although the residual vector, yobs ? X? , is an
n-dimensional vector, the residual at ? satisfies p linearly
independent constraints, X**{T} (yobs ? X? ) = 0. That is, the residual
at ? is the projection of the observed response vector, yobs , into
an (n ? p)-dimensional linear subspace of the n-dimensional response
space. The estimate ?R takes into account the fact that ?**2 is
estimated from residuals that have only n ? p degrees of freedom.
Another argument often put forward for REML estimation is that ?_R is
an unbiased estimate of ?**2 , in the sense that the expected value of
the estimator is equal to the value of the parameter. However,
determining the expected value of an estimator involves integrating
with respect to the density of the estimator and we have seen that
densities of estimators of variances will be skewed, often highly
skewed. It is not clear why we should be interested in the expected
value of a highly skewed estimator. If we were to transform to a
more symmetric scale, such as the estimator of the standard deviation
or the estimator of the logarithm of the standard deviation, the
REML estimator would no longer be unbiased. Furthermore, this
property of unbiasedness of variance estimators does not generalize
from the linear regression model to linear mixed models. This is all
to say that the distinction between REML and ML estimates of
variances and variance components is probably less important that
many people believe.
best, landon [0]: www.researchgate.net/publictopics.PublicPostFileLoader.html?id=53326f19d5a3f206348b45af&key=6a85e53326f199010f
Best, Phillip Alday _______________________________________________ R-sig-mixed-models at r-project.org mailing list
- -- Violence is the last refuge of the incompetent. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQIcBAEBCgAGBQJVmLQMAAoJEDeph/0fVJWs21AP/RPMNrQDdkd67eQWfb9jhbTZ VIfZDNAt088n8XednUzk6BQlVUirb8akqdVq8YqtdaCotdP5dxXjRr30hO72FeRj rvLWcZXtDVuNwOFZA44Aw2YplwD1sU+G+vMLOsPD4BrBRfByY5FkkX2lTliQjbVK eYNi57977w/AE7y48OTprwBdkNkWjTQTrnKAsglBtOlnC+x8TdD8J0WfaZCfI1CP iUN6298pdSNWm+vAWaFCeq6Wig8o2kYGONd1RBDzGidbcy5CiQebuJYZ8cU5zl4V X34alL6cX+9qDLWbi4jYz1/3lG1U4NsCKhs6fM7imxOFV9XXtsZTr16xmHbkc6B+ daNAbln/SHKoCpuvGiO+IL/H8y8W1DLyJk7sRlb7tOuDHViBCOPLwPZj71aJFFab qY40JvxdV0rputOeg5OtTyqROxCwtgqKWDaGcli1Dpcrca2aE7qNpPdMjxmnJN+j RDaCkHflJuwHifkupg7qSfLa2zbBf4KirfGnOsoeicZPF4s7BmDLU28fMlhAqEly Dbg3niPaW4/X3X69yrkw5XG46uftRGlSUdl/eMWWBUVy3o5oAkAgxpQl/49Md0CX SyYyv+Iaa9NPtgI828NYfw59xsW98hTnNNtG+D0V4nN/1d29M/KM6LspxV9Nu64b XNno0S7U16mSu1KhvIwY =7uyi -----END PGP SIGNATURE-----
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models