Skip to content
Prev 17957 / 20628 Next

Large mixed & crossed-effect model looking at educational spending on crime rates with error messages

On 01/10/2019 08:25, Ades, James wrote:
I see what you?re saying
This is non trivial in the general case. If you know something about the
latent structure, then things like structural equation models may help,
see e.g.

https://www.johnmyleswhite.com/notebook/2016/02/25/a-variant-on-statistically-controlling-for-confounding-constructs-is-harder-than-you-think/

which provides an alternative presentation of

Westfall, J. & Yarkoni, T. (2016): Statistically Controlling for
Confounding Constructs Is Harder than You Think PLoS ONE, , 11 , 1-22

Remember, linear regression -- fixed or mixed effect -- isn't sufficient
to make causal conclusions without additional assumptions. The issue
with collinearity (as long as its not perfect / leads to rank
deficiency) is not so much in the estimates as in the standard errors,
which get inflated by the covariance. There are several classical
approaches to dealing with this (such as residualization), but they all
have pros and cons. (Oversimplifying a bit) Residualization for example
attributes only the residual variance from the first predictor to the
second predictor -- i.e. all of the shared variance is attributed to the
first predictor. Regularized regression (e.g. LASSO, ridge, elastic net)
may help, especially with prediction. Equivalently, in a Bayesian
framework, appropriate choice of priors may help to pull the estimates
apart.

But all of these comments aren't specific to the mixed-model case, so
that opens up the set of resources you can turn to. ;)
If I understand you correctly, you're asking what happens when your
response variable (y) is missing for a given combination of predictors
(x's)? Depending on the exact structure of the missing data, multiple
imputation might help you there, but generally if a particular case
never occurs (say "12 hours of sunlight but with winter temperatures"
for a model predicting plant growth derived from observations taken
outside but which you want to use to predict in a greenhouse), it's hard
to make inferences about that complete interaction. lme4 by default
drops incomplete cases (i.e. any rows in the dataframe where there is an
NA *for variables used in the model*).

Phillip