Teaching Mixed Effects
Wow. Two very small points:
I feel that the likelihood ratio is a perfectly reasonable way of comparing two model fits where one is a special case of the other. In fact, if the models have been fit by maximum likelihood, the likelihood ratio would, I think, be the first candidate for a test statistic. The problem with likelihood ratio tests is not the likelihood ratio, per se -- it is converting the likelihood ratio to a p-value. You need to be able to evaluate the distribution of the likelihood ratio under the null hypothesis. The chi-square approximation to the distribution is exactly that - an approximation - and its validity depends on not testing at the boundary and on having a large sample, in some sense of the sample size. If I were really interested in evaluating a p-value for the likelihood ratio I would probably try a parametric bootstrap to get a reference distribution.
Even if we are not p-value obsessed, we would still presumably like to be able make some kind of (even informal) inference from the difference in fits, perhaps at the level of "model 1 fits (much better|a little better|about the same|a little worse|much worse) than model 2", or "the range of plausible estimates for this parameter is (tiny|small|moderate|large|absurdly large)". To do that we need some kind of metric (if we have not yet fled to Bayesian or quasi-Bayesian methods) for the range of the deviance under some kind of null case -- for example, where should we set cutoff levels on the likelihood profile to determine confidence regions for parameters? Parametric bootstrap makes sense, although it is a little scary to think e.g. of doing a power analysis for such a procedure ...
2.3) "Testing" random effects is considered inappropriate (but see 2.1
for methods?).
BMB: I don't think this is necessarily true. Admittedly it is a point
null hypothesis (variance will never be _exactly_ zero), but I
can certainly see cases ("does variation among species contribute
significantly to the overall variance observed"?) where one would
want to test this question. This is a bit murky but I think the
distinction is often between random effects as part of an experimental
design (no point in testing, not interesting) and random effects
as observational data.
Actually the MLE or the REML estimate of a variance component can indeed be zero. The residual variance (i.e. the variance of the "per observation" noise term) is only zero for artificial data but the estimates of other variance components can be exactly zero.
I agree that there is a non-zero probability that the _estimate_ will be exactly zero, but my point is that there is really no chance in reality that species, blocks, or other random effects will not vary at all ... (sorry for the convolution of that last sentence) Ben Bolker