Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?
In practice having correlation across groups seems to distort the distribution of the random effect causing some groups to be clustered. In extreme cases the random effect can become multimodal.
Further to this, there are nonparametric maximum likelihood (and Bayesian) mixed models that might be useful for this kind of situation - the distribution of the random effects is modelled as a finite mixture (of eg normals) to be estimated. The npmlreg package implements Murray Aitkin's approach.