Minimum number of levels for mixed model - R-SIG-mixed-models

nrm2010 · 2013-02-08T20:39:48Z

Hello, Ben, Thank you for the response. I created some confusion by stating treatment (trt) instead of the treatment blocks, of which there are 3. The Murtaugh paper seems to take one position on the perhaps philosophical issue previously discussed on the forum concerning whether or not the model design has to be faithful to the experimental design. My larger question is how often it will be feasible to use mixed models with nested effects if we require a minimum of 5^n samples for n leve

Ben Bolker

Fri, Feb 8, 2013 5:23 PM #

nrm2010 <nrm2010 at ...> writes:

It's not going to work very well to take treatment (blocks)
as a random effects, for the various reasons enumerated in the
FAQ.  I would strongly advise modeling them as fixed effects.

It took me a minute, but I guess by "n" here you mean the number
of *hierarchical* levels?  (I initially took it as the number of
levels of each random factor ... one of the difficulties with
mixed models is the terminology ...)

Again, this is discussed at some length in the FAQ; my personal
philosophical point of view probably comes through there.  I can say from
a basis of experience and guessing (very few rigorous proofs, sorry)
that if you try to fit multilevel models with fewer than 5 :

* sometimes the model will produce an error
* lots of times you will get estimates of zero variance.  
  * this _might_ represent bias in the estimator, or it might 
represent a weird distribution of the estimator, which might have
the right mean but a big spike at zero and a long tail.
* I don't have strong evidence for this, but it seems much
more likely that the optimization will fail *silently* and
give you wonky answers

125 samples is a big number in some fields, it's a small number
in other fields.  Maybe mixed models _aren't_ useful in your field ...
The fundamental problem, which I think you're going to have trouble
getting around, is that it's very hard to estimate variances reliably
from that few samples.  An analogy would be complaining that you're
having a hard time estimating population means reliably from samples
of size 2 or 3 ...

Remember, also, that the problem is primarily with the top level.
As I hope I made clear previously, the number of 'samples' we
are referring to for nested models is the total number of exchangeable
levels -- for a three level nested 5/5/5 model, we will have 5
top-level, 25 middle-level, and 125 bottom-level units.  Of course,
if you want to use crossed random effects, you tend to have more
"top-level" units (i.e. more variances to estimate from small
samples -- e.g. 5 plots x 5 years x 10 samples per year =
5 samples for among-plot variance, 5 for among-year variance,
25 for the plot-year interaction, and 250 overall ...)

I put together some little sims illustrating the issue: 
http://rpubs.com/bbolker/4187