Question about misspecified multilevel model

Hi Chris,

Since you have a concrete example why not just calculate the models and see what happens? The intercept-only model that you proposed is quite simple and fast to compute:

Random effects:
 Groups           Name        Variance Std.Dev.
 classid:schoolid (Intercept)   99.23   9.961  
 schoolid         (Intercept)   77.49   8.803  
 Residual                     1028.23  32.066  
Number of obs: 1190, groups:  classid:schoolid, 312; schoolid, 107

Now, if you do a model with just classroom, then you want this model:

class.mod <- lmer(mathgain ~ (1 | schoolid:classid), data = classroom)

(I'm assuming that classid isn't globally unique, but rather only unique within schools, otherwise the / and : operators add a bit of unnecessary overhead, but that's a different issue)

which has the following random effects: 

Random effects:
 Groups           Name        Variance Std.Dev.
 schoolid:classid (Intercept)  180     13.42   
 Residual                     1025     32.02   
Number of obs: 1190, groups:  schoolid:classid, 312

As you can see, the classid absorbed most of the variance. If you look at the fixed effects, then you'll see slight differences in the error estimates there as well -- models with incomplete random-effect structures typically have error estimates that are two small and are thus anti-conversative. This can be attributed to the variance-bias tradeoff or equivalently overfitting (see e.g. the new book Statistical Rethinking or any of the other classics mentioned here such as Pinheiro & Bates or Gelman & Hill). Roughly, leaving out random effects structure implies that the data are independent in in a way that they are not, which leads you to overestimate how much information you actually have and thus how little error.

Now, if you're curious you can also compare log likelihood if you refit the models via ML*:

Question about misspecified multilevel model

Thread (3 messages)