likelihood-ratio tests in conflict with coefficiants in maximal random effect model
On Mar 7, 2014, at 6:21 AM, Shravan Vasishth <vasishth.shravan at gmail.com> wrote:
Hi Roger and Emilia, and others, I just wanted to say that in Emilia's data, she has 36 subjects and 20 items. Roger, would you agree that it is very difficult with this amount of data to accurately estimate the full variance-covariance matrices for subjects and for items random effects, especially the correlation parameters? The numbers that lmer returns, for such sizes of data, are pretty wild estimates, and often have no bearing to the true underlying correlations. I think that in this situation we might be asking too much from lmer, without giving it enough data. If, on the other hand, we have a lot of data by subjects and items, it becomes possible to estimate these parameters. I believe this may have been, at least partly, the intent of Douglas Bates' original message about overparameterization.
That?s a good question. I imagine there is a fair bit of uncertainty regarding the correlation parameters, though I would guess that it?s not huge for this-sized dataset. The point estimates that lme4(.0) give us don?t quantify this uncertainty, but of course we could use Bayesian methods to get a better sense of them. More generally, this point that you raise, Shravan, is precisely the reason that I tend to favor likelihood-ratio tests over the t-statistic for the purposes of confirmatory hypothesis tests like Emilia?s. As Baayen, Davidson and Bates (2008, page 396) crucially point out, the t-statistic is computed conditional on a point estimate of the random-effects covariance matrix, and fails to take into account uncertainty in the estimate of this matrix. The likelihood ratio does not have this problem. (It has other problems ? namely that the log likelihood ratio is not truly chi-squared distributed ? but with 20 items and 36 subjects in a balanced design I would expect that the chi-squared approximation is fairly close. And at any rate, the same problem exists with the t statistic.) So my take is that how much we should worry about these issues depends in part on our modeling goals. For a confirmatory hypothesis test like Emilia?s on her dataset, I wouldn?t worry much about overparameterization for the models she was showing us. If she wanted to aggressively interpret the parameter estimates resulting from a particular model fit, on the other hand, I would be much more cautious. Best Roger