I have encountered an issue a couple times recently when fitting models in
lme4 that I have not seen addressed in commonly cited papers for dealing w/
boundary/singular fit issues.
Say I have a categorical variable representing a set of within-subject
contrasts, which is entered into the model as a set of effect coded
variables, e.g.
df$congruent.f <- factor(df$congruent,levels=c(1,0,-1),
labels=c("congruent","incongruent","none"))
contrasts(df$congruent.f) <- contr.sum(3)
which will produce two variables in the model (e.g. congruent.f1,
congruent.f2). When I fit the model w/ random intercepts and slopes for
these contrasts (along w/ other control variables), I get a boundary
(singular) fit warning.
Examining the correlations and variance components suggests that the
primary cause of the warning is in one of these contrast variables (e.g.
congruent.f2). So, would it ever be acceptable in this scenario to remove
the random effect term ONLY for congruent.f2, not the entire set of
congruent.f contrasts, where the goal is statistical inference and I do not
want the p-values/confidence intervals for congruent.f1 to be
anticonservative when it does in fact show variance across subjects?
I have to this point assumed that this would be a bad idea and tried to
simplify such models in other ways (i.e. setting correlations to zero or
removing other random effects), but this does not always work and seems a
roundabout method if you are not dealing with the primary problem.
Thanks,
Nathan
addressing singularity in lme4 fits caused by subsets of contrasts
3 messages · Nathan Tardiff, Ben Bolker, Phillip Alday
This is an interesting question ("interesting" means among other
things "I don't know").
If you get a variance estimate of zero for the second contrast then
removing that term from the model should (I think) give you **exactly
the same** model results (as an analogy: suppose you had the mean model
y = a +b*x+c*z and for some reason got an estimate of c=0, then you said
"can I drop z from the model?")
More generally, in order to know whether this is OK you have to
define what "OK" means. Trying to avoid philosophical or subjective
statements, you could ask whether following this process gives 'good'
results (unbiased and/or low-error estimates and good coverage of
whichever set of parameters you're interested in). In particular, if
you're interested in inference on fixed effects only, then I'd say you
can do anything to the random effects component of the model as long as
it doesn't mess up your estimation and inference on the fixed effects.
You could try some simulations to test your idea (note that your
conclusions can only be for the range of parameters you've actually
simulated: in particular Bates et al 2015 criticize the realism of the
simulations from Barr et al 2013 "keep it maximal":
"First, the simulations implement a factorial contrast that is
atypically large compared to what is found in natural data. Second, and
more importantly, the correlations in the random effects structure range
from?0.8 to +0.8. Such large correlation parameters are indicative of
overparameterization.They hardly ever represent true correlations in the
population. As a consequence, these simulations do not provide a
solid foundation for recommendations about how to fit
mixed-effects models to empirical data."
Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen.
?Parsimonious Mixed Models.? ArXiv:1506.04967 [Stat], June 16, 2015.
http://arxiv.org/abs/1506.04967.
On 1/21/21 11:54 AM, Nathan Tardiff wrote:
I have encountered an issue a couple times recently when fitting models in
lme4 that I have not seen addressed in commonly cited papers for dealing w/
boundary/singular fit issues.
Say I have a categorical variable representing a set of within-subject
contrasts, which is entered into the model as a set of effect coded
variables, e.g.
df$congruent.f <- factor(df$congruent,levels=c(1,0,-1),
labels=c("congruent","incongruent","none"))
contrasts(df$congruent.f) <- contr.sum(3)
which will produce two variables in the model (e.g. congruent.f1,
congruent.f2). When I fit the model w/ random intercepts and slopes for
these contrasts (along w/ other control variables), I get a boundary
(singular) fit warning.
Examining the correlations and variance components suggests that the
primary cause of the warning is in one of these contrast variables (e.g.
congruent.f2). So, would it ever be acceptable in this scenario to remove
the random effect term ONLY for congruent.f2, not the entire set of
congruent.f contrasts, where the goal is statistical inference and I do not
want the p-values/confidence intervals for congruent.f1 to be
anticonservative when it does in fact show variance across subjects?
I have to this point assumed that this would be a bad idea and tried to
simplify such models in other ways (i.e. setting correlations to zero or
removing other random effects), but this does not always work and seems a
roundabout method if you are not dealing with the primary problem.
Thanks,
Nathan
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I agree with Ben here. This is actually one of the things I'm working on in my rather limited free time. I also suspect that it will generally give an equivalent model BUT there are a few possible ways it might not: 1. You may have more power (and thus less variability in your estimates and errors) because you have fewer parameters to estimate. This is vaguely similar to some of the work in Matuscheck et al. 2017 on controlling Type-I and complexity in mixed models. (a sort of spin-off project of the Bates et al. Parsimonious Mixed Models) 2. Weird things happen at the boundary. One of the things that keeps getting rediscovered is that singular fits are a strange attractor. This is why the machinery for MCMC-based p-values in early lme4 ultimately didn't work and this is something we've seen in MixedModels.jl in our work on the parametric bootstrap -- bootstrapped estimates of the random effects often look like a point mass around 0 mixed with a normal distribution centered elsewhere. If you remove this one strange attractor, it may allow some of the other slopes to move a bit (especially if your correlation structure wasn't constrained to zero previously). I expect they won't move much, but they might. @Ben I've seen an example of this when using rePCA. Occasionally estimates of the effective dimensionality of the RE change dramatically just by adding or removing a single contrast and associated correlation parameters. I wish I had had the good sense to save that example ... Best, Phillip
On 21/1/21 11:01 pm, Ben Bolker wrote:
? This is an interesting question ("interesting" means among other
things "I don't know").
? If you get a variance estimate of zero for the second contrast then
removing that term from the model should (I think) give you **exactly
the same** model results (as an analogy: suppose you had the mean model
y = a +b*x+c*z and for some reason got an estimate of c=0, then you said
"can I drop z from the model?")
? More generally, in order to know whether this is OK you have to define
what "OK" means.? Trying to avoid philosophical or subjective
statements, you could ask whether following this process gives 'good'
results (unbiased and/or low-error estimates and good coverage of
whichever set of parameters you're interested in). In particular, if
you're interested in inference on fixed effects only, then I'd say you
can do anything to the random effects component of the model as long as
it doesn't mess up your estimation and inference on the fixed effects.
? You could try some simulations to test your idea (note that your
conclusions can only be for the range of parameters you've actually
simulated: in particular Bates et al 2015 criticize the realism of the
simulations from Barr et al 2013 "keep it maximal":
"First, the simulations implement a factorial contrast that is
atypically large compared to what is found in natural data.? Second, and
more importantly, the correlations in the random effects structure range
from?0.8 to +0.8.? Such large correlation parameters are indicative of
overparameterization.They hardly ever represent true correlations in the
population.? As a consequence, these simulations? do? not? provide? a
solid? foundation? for? recommendations? about? how? to? fit
mixed-effects models to empirical data."
Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen.
?Parsimonious Mixed Models.? ArXiv:1506.04967 [Stat], June 16, 2015.
http://arxiv.org/abs/1506.04967.
On 1/21/21 11:54 AM, Nathan Tardiff wrote:
I have encountered an issue a couple times recently when fitting
models in
lme4 that I have not seen addressed in commonly cited papers for
dealing w/
boundary/singular fit issues.
Say I have a categorical variable representing a set of within-subject
contrasts, which is entered into the model as a set of effect coded
variables, e.g.
df$congruent.f <- factor(df$congruent,levels=c(1,0,-1),
???????????????????????????? labels=c("congruent","incongruent","none"))
contrasts(df$congruent.f) <- contr.sum(3)
which will produce two variables in the model (e.g. congruent.f1,
congruent.f2). When I fit the model w/ random intercepts and slopes for
these contrasts (along w/ other control variables), I get a boundary
(singular) fit warning.
Examining the correlations and variance components suggests that the
primary cause of the warning is in one of these contrast variables (e.g.
congruent.f2). So, would it ever be acceptable in this scenario to remove
the random effect term ONLY for congruent.f2, not the entire set of
congruent.f contrasts, where the goal is statistical inference and I
do not
want the p-values/confidence intervals for congruent.f1 to be
anticonservative when it does in fact show variance across subjects?
I have to this point assumed that this would be a bad idea and tried to
simplify such models in other ways (i.e. setting correlations to zero or
removing other random effects), but this does not always work and seems a
roundabout method if you are not dealing with the primary problem.
Thanks,
Nathan
????[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models