Teaching Mixed Effects
Raubertas, Richard wrote:
[much more snippage]
Some will probably say we should focus on (interval) estimation of parameters rather than testing. But this doesn't solve the problem at hand. Remember how this discussion started: the difficulty of calculating reliable p-values for fixed effects. Because of the duality between hypothesis tests and confidence intervals, if you can't get reliable p-values, then essentially by definition you can't get reliable confidence intervals either. So the issue of testing- versus-estimation is a red herring with respect to the deeper problem of inference for fixed effects in mixed models.
Hear, hear.
Others may try to dodge the problem by claiming to be Bayesians, calculating credible intervals from posterior distributions. But this won't hold water if they are using lme4 and mcmcsamp. No honest Bayesian can use an "uninformative" prior (if such a thing even exists), or at least not more than once in any area of research: after analysis of one data set, the posterior from that analysis should inform the prior for the next, but lme4 has its priors hard-coded. I think the real rationale for mcmcsamp is the hope that it will produce results with good frequentist properties. I am not aware that this has been demonstrated for mixed models in the peer-reviewed statistics literature.
If mcmcsamp worked at the moment I would strongly suggest that we should be trying to figure out how its conclusions differ from those based on other priors (e.g. Gelman's half-Cauchy priors for variances).
It is unhelpful of me not to provide p-values in the lmer summaries but I develop the software out of interest in doing it as well as I possibly can and not because someone assigns me a task to compute something. I really don't know of a good way of assigning p-values to t-ratios or F-ratios so I don't. I still report the ratio of the estimate divided by it standard error, and even call it a t-ratio, because I think it is informative.
Doug, can you elaborate on that last clause? In what way is the (absolute) ratio |T| informative that the monotonic transformation 2*(1 - pnorm(abs(T))) is not? In other words, if a p-value (based in this case on a standard normal) is not reliable for inference, what inferential value does T have? Less formally, if T = 2, for example, what exactly do you conclude about the parameter, and what is your confidence in that conclusion?
Perhaps it's just that people are a little less likely to take it too seriously?
[ ... ]
I feel that the likelihood ratio is a perfectly reasonable way of comparing two model fits where one is a special case of the other. In fact, if the models have been fit by maximum likelihood, the likelihood ratio would, I think, be the first candidate for a test statistic. The problem with likelihood ratio tests is not the likelihood ratio, per se -- it is converting the likelihood ratio to a p-value. You need to be able to evaluate the distribution of the likelihood ratio under the null hypothesis. The chi-square approximation to the distribution is exactly that - an approximation - and its validity depends on not testing at the boundary and on having a large sample, in some sense of the sample size. If I were really interested in evaluating a p-value for the likelihood ratio I would probably try a parametric bootstrap to get a reference distribution.
Yes, but the problem is that (as you noted earlier in your post) the null distribution depends on the values of the nuisance parameters, except in very special cases. A simple parametric bootstrap conditions on the estimated values of those parameters, as if they were known. Nevertheless, a function to do this might be a useful addition to lme4.
Well, this does already exist, more or less, in the form of the simulate() method for mer objects: see http://glmm.wikidot.com/basic-glmm-simulation cheers Ben Bolker