Message-ID: <37D14517-1703-4CBD-960C-8F827929E6D4@ucsd.edu>
Date: 2014-03-07T17:13:46Z
From: Roger Levy
Subject: likelihood-ratio tests in conflict with coefficiants in maximal random effect model
In-Reply-To: <CA+1m9_7VRbiMm653KGXsAvy7pQFULUoc9e9m+bn2NyZON3zD-g@mail.gmail.com>
On Mar 7, 2014, at 6:21 AM, Shravan Vasishth <vasishth.shravan at gmail.com> wrote:
> Hi Roger and Emilia, and others,
>
> I just wanted to say that in Emilia's data, she has 36 subjects and 20
> items. Roger, would you agree that it is very difficult with this amount of
> data to accurately estimate the full variance-covariance matrices for
> subjects and for items random effects, especially the correlation
> parameters? The numbers that lmer returns, for such sizes of data, are
> pretty wild estimates, and often have no bearing to the true underlying
> correlations. I think that in this situation we might be asking too much
> from lmer, without giving it enough data. If, on the other hand, we have a
> lot of data by subjects and items, it becomes possible to estimate these
> parameters.
>
> I believe this may have been, at least partly, the intent of Douglas Bates'
> original message about overparameterization.
That?s a good question. I imagine there is a fair bit of uncertainty regarding the correlation parameters, though I would guess that it?s not huge for this-sized dataset. The point estimates that lme4(.0) give us don?t quantify this uncertainty, but of course we could use Bayesian methods to get a better sense of them.
More generally, this point that you raise, Shravan, is precisely the reason that I tend to favor likelihood-ratio tests over the t-statistic for the purposes of confirmatory hypothesis tests like Emilia?s. As Baayen, Davidson and Bates (2008, page 396) crucially point out, the t-statistic is computed conditional on a point estimate of the random-effects covariance matrix, and fails to take into account uncertainty in the estimate of this matrix. The likelihood ratio does not have this problem. (It has other problems ? namely that the log likelihood ratio is not truly chi-squared distributed ? but with 20 items and 36 subjects in a balanced design I would expect that the chi-squared approximation is fairly close. And at any rate, the same problem exists with the t statistic.)
So my take is that how much we should worry about these issues depends in part on our modeling goals. For a confirmatory hypothesis test like Emilia?s on her dataset, I wouldn?t worry much about overparameterization for the models she was showing us. If she wanted to aggressively interpret the parameter estimates resulting from a particular model fit, on the other hand, I would be much more cautious.
Best
Roger