Skip to content
Prev 14776 / 20628 Next

Question about misspecified multilevel model

Hi Chris,

Since you have a concrete example why not just calculate the models and see what happens? The intercept-only model that you proposed is quite simple and fast to compute:

Random effects:
 Groups           Name        Variance Std.Dev.
 classid:schoolid (Intercept)   99.23   9.961  
 schoolid         (Intercept)   77.49   8.803  
 Residual                     1028.23  32.066  
Number of obs: 1190, groups:  classid:schoolid, 312; schoolid, 107

Now, if you do a model with just classroom, then you want this model:

class.mod <- lmer(mathgain ~ (1 | schoolid:classid), data = classroom)

(I'm assuming that classid isn't globally unique, but rather only unique within schools, otherwise the / and : operators add a bit of unnecessary overhead, but that's a different issue)

which has the following random effects: 

Random effects:
 Groups           Name        Variance Std.Dev.
 schoolid:classid (Intercept)  180     13.42   
 Residual                     1025     32.02   
Number of obs: 1190, groups:  schoolid:classid, 312

As you can see, the classid absorbed most of the variance. If you look at the fixed effects, then you'll see slight differences in the error estimates there as well -- models with incomplete random-effect structures typically have error estimates that are two small and are thus anti-conversative. This can be attributed to the variance-bias tradeoff or equivalently overfitting (see e.g. the new book Statistical Rethinking or any of the other classics mentioned here such as Pinheiro & Bates or Gelman & Hill). Roughly, leaving out random effects structure implies that the data are independent in in a way that they are not, which leads you to overestimate how much information you actually have and thus how little error.

Now, if you're curious you can also compare log likelihood if you refit the models via ML*:
refitting model(s) with ML (instead of REML)
Data: classroom
Models:
class.mod: mathgain ~ (1 | schoolid:classid)
correct.mod: mathgain ~ (1 | schoolid/classid)
            Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
class.mod    3 11785 11800 -5889.5    11779                            
correct.mod  4 11779 11800 -5885.7    11771 7.5906      1   0.005867 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

And you see that the model fits while tantalisingly close are nonetheless significantly different. 

Now, I'm not sure what will happen with more complex fixed-effects or random-effects structures, but my guess is that generally the variance will be allocated to whatever variable is "close enough" (here classroom is close enough to school because a collection of classrooms makes up a school) and if there are none close enough, the remaining variance will just contribute to the residual variance (which didn't happen here).

* The fixed-effects model matrix is the same for both models so that you could potentially compare REML-fitted models, but that also gets you into all sorts of fun debate about the pros and cons of REML and it's just easier to avoid all the issues with comparisons of REML-fit

Best,
Phillip