Skip to content
Prev 13527 / 20628 Next

Missing values in lmer vs. HLM

My apologies for making such a statement then not following up.  As has
been mentioned, this is a holiday weekend in the U.S.

The section that Landon quoted does get at the point of my comment.

The usual justification for REML is that REML estimators of variance
components are less biased than are the maximum likelihood estimators
(mle).  On the surface this seems to be a convincing argument, for who
would want to use a "biased" estimator?

But why should we be concerned with the estimator of the variance?  Why not
the estimator of the standard deviation, or the logarithm of the standard
deviation?  The distribution of variance estimators are highly skewed in
most cases.  Consider the simplest case of estimating the variance from an
i.i.d. sample from a Gaussian distribution.  The distribution of the
estimator is a Chi-squared distribution, which is highly skewed.  The
distribution of the estimator of ? is less skewed.  The distribution of the
estimator of log(?) is more-or-less symmetric.

The important point here is that "bias" relates to the expected value of
the estimator.  The argument for REML is based on the expected value of a
quantity with a highly skewed distribution, but we know that this is a poor
measure of location for such a distribution.  That's why it is more
informative to consider median salaries instead of average salaries.  The
fact that the average wealth of members of LeBron James's high school
basketball team is very high doesn't make them all rich.

Mle's have an invariance property in that the mle of ? is the square root
of the mle of ??; the mle of log(??) is the logarithm of the mle of ??,
etc.  Unbiased estimators aren't invariant under transformation.  The
square root of an unbiased estimator of ?? is not an unbiased estimator of
?.

If an unbiased estimator were so important then we should probably consider
the estimate of log(??), not ?? itself.  The reason for our being fixated
on ?? is more computational than practical.  When using hand calculations
it is easiest to estimate ?? then derive an estimate of ? from that.  These
considerations are less convincing when using computers.

In summary, the case for REML is less convincing than it seems at first
glance.  It is a consequence of a certain type of mathematical exposition,
where your assumptions are never questioned.  You only care about going
from "if" to "then".  In mathematical statistics you say, "assuming that
the model is correct, these are the consequences" and that is all there is
to it.  The way that the game is actually played is that, when you get to
the end of the proof and discover that you need some conditions to make it
work, you go back to the beginning and add those conditions.  It helps if
you call this case the "regular" case or the "normal" case or some other
word with favorable connotations.

So if you want to characterize the "best" estimator you do it by peeling
off properties related to the first moment, the second moment, etc. For the
first moment you say that the expected value of the estimator must be equal
to the parameter being estimated and you call that the "unbiased" case.
Technically this is just a mathematical property but the connotation of the
word gives it much more heft than the mathematical property.  In
mathematical statistics it is irrelevant to question why it is this
particular estimator or this particular scale that is of interest - the
only objective is to prove the theorem and publish the result.

(The folklore in our department is that George Box's famous statement about
"all models are wrong" originated in a thesis defense where the candidate
began by stating that "Assuming that the model is correct" and George
interrupted to say "But all models are wrong".  It wasn't a good day for
the candidate.  I'm sorry to say that I don't know if this story is
accurate as I never took the opportunity to ask him.)
On Sat, Jul 4, 2015 at 11:36 PM landon hurley <ljrhurley at gmail.com> wrote: