What is the appropriate zero-correlation parameter model for factors in lmer?
I have to clarify that I was talking about fm5 in the way Rune Haubo
specified it:
fm5.2 = y ~ 1 + f + (1 | g) + (0 + f || g) which is, when using a
treatment coded factor with 3 levels, equivalent to
fm5.2 = y ~ 1 + f + (1 | g ) +
(0 + dummy(f, "1st level") | g) +
(0 + dummy(f, "2nd level") | g) +
(0 + dummy(f, "3rd level") | g)
In the way Reinhold Kliegl specified fm5, i.e. with (0 + f | g)
instead of (0 + f || g), it seems to me that fm5 is just an
over/re-parameterized version of fm6 with one additional parameter and
they both yield the same fit.
However, I think both fm5.2 and fm5 are difficult to understand
because they use (1 | g) and, at the same time, the 0 + f notation
within one formula. So my question remains the same: as Reinhold
Kliegl put it "if the factor levels are A, B, C, and the two contrasts
are c1 and c2, I thought I can specify either (1 + c1 + c2) or (0 + A
+ B + C)". But this doesn't seem to be the case for random effects -
or is it?
Maybe answering this question can also explain the difference between
fm3 and fm7.
Best regards,
Maarten
On Tue, May 22, 2018 at 1:17 PM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
There is an interpretable alternative to fm5 (actually there are many ...), called fm8 below, that avoids the redundancy between variance components. The change is to switch from (1 |g) + (0 + f | g) = (1 | g) + (0 + A + B + C | g) to 1 | g) + (0 + c1 + c2 |g ), where c1 and c2 are the contrasts defined for f. (I have actually used such LMMs quite often.) With this specification the difference to the maxLMM (fm6) is that the correlation between intercept and contrasts is suppressed to zero. The correlation parameters now refer to the correlations between effects of c1 and c2, not to the correlations between A, B, and C. Actually, this is but one example of many LMMs one could slot into this position of the hierarchical model sequences. At this level of model complexity one can suppress various subsets of correlation parameters (as illustrated in Bates et al. (2015)[1] and various vignettes of the RePsychLing package). fm1 = y ~ 1 + f + (1 | g) # minimal LMM version 1 (min1LMM) fm2 = y ~ 1 + f + (1 | f:g) # minimal LMM version 2 (min2LMM) fm3 = y ~ 1 + f + (0 + f || g) # zcpLMM with 0 in RE (zcpLMM_RE0) fm4 = y ~ 1 + f + (1 | g) + (1 | f:g) # LMM w/ f x g interaction (intLMM) fm5 = y ~ 1 + f + (1 | g) + (0 + f | g) # N/A fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM) fm7 = y ~ 1 + f + (1 + f || g) # zcpLMM with 1 in RE (zcpLMM_RE1) fm8 = y ~ 1 + f + (1 | g) + (0 + c1 + c2 | g) # parsimonious LMM (prsmLMM) Hierarchical model sequences (1) maxLMM_RE1 -> prsmLMM -> intLMM -> min1LMM # fm6 -> fm8 -> fm4 -> fm1 (2) maxLMM_RE1 -> prsmLMM -> intLMM -> min2LMM # fm6 -> fm8 -> fm4 -> fm2 (3) maxLMM_RE0 -> prsmLMM -> zcpLMM_RE0 -> min2LMM # fm6 -> fm8 -> fm3 -> fm2 (4) maxLMM_RE1 -> prsmLMM -> zcpLMM_RE1 -> min1LMM # fm6 -> fm8 -> fm7 -> fm1 (new sequence) ``` I will update the RPub in the next days. [1] https://arxiv.org/pdf/1506.04967.pdf Best regards, Reinhold Kliegl On Tue, May 22, 2018 at 11:00 AM, Maarten Jung <Maarten.Jung at mailbox.tu-dresden.de> wrote:
I see that fm2 is nested within fm3 and fm4. But I have a hard time understanding fm3 and fm2 because, as Reinhold Kiegl said, they specify the f:g interaction but without the g main effect. Can someone provide an intuition for these models? Also, it is not entirely clear to me what fm5 represents. It looks to me, and again I am with Reinhold Kiegl , as if there were over-parameterization going on. Cheers, Maarten On Tue, May 22, 2018 at 9:45 AM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
Ok, I figured out the answer to the question about fm2. fm2 is indeed a very nice baseline for fm3 and fm4. So I distinguish between min1LMM and min2LMM. fm1 = y ~ 1 + f + (1 | g) # minimal LMM version 1 (min1LMM) fm2 = y ~ 1 + f + (1 | f:g) # minimal LMM version 2 (min2LMM) fm3 = y ~ 1 + f + (0 + f || g) # zcpLMM with 0 in RE (zcpLMM_RE0) fm4 = y ~ 1 + f + (1 | g) + (1 | f:g) # LMM w/ f x g interaction (intLMM) fm5 = y ~ 1 + f + (1 | g) + (0 + f | g) # N/A fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM) fm7 = y ~ 1 + f + (1 + f || g) # zcpLMM with 1 in RE (zcpLMM_RE1) (1) maxLMM_RE1 -> intLMM -> min1LMM # fm6 -> fm4 -> fm1 (2) maxLMM_RE1 -> intLMM -> min2LMM # fm6 -> fm4 -> fm2 (3) maxLMM_RE0 -> zcpLMM_RE0 -> min2LMM # fm6 -> fm3 -> fm2 (4) maxLMM_RE1 -> zcpLMM_RE1 -> min1LMM # fm6 -> fm7 -> fm1 (new sequence) On Tue, May 22, 2018 at 12:21 AM, Reinhold Kliegl <reinhold.kliegl at gmail.com> wrote:
Sorry, I am somewhat late to this conversation. I am responding to this thread, because it fits my comment very well, but it was initially triggered by a previous thread, especially Rune Haubo's post here [1]. So I hope it is ok to continue here. I have a few comments and questions. For details I refer to an RPub I put up along with this post [2]. I start with a translation between Rune Haubo's fm's and the terminology I use in the RPub: fm1 = y ~ 1 + f + (1 | g) # minimal LMM (minLMM) fm3 = y ~ 1 + f + (0 + f || g) # zero-corr param LMM with 0 in RE (zcpLMM_RE0) fm4 = y ~ 1 + f + (1 | g) + (1|f:g) # LMM w/ fixed x random factor interaction (intLMM), fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM) fm7 = y ~ 1 + f + (1 + f || g) # zero-corr param LMM with 1 in RE (zcpLMM_RE1) Notes: f is a fixed factor, g is a group (random) factor; fm1 to fm6 are in Rune Haubo's post; fm7 is new (added by me). I have not used fm2 and fm5 so far (see below). (I) The post was triggered by the question whether intLMM is nested under zcpLMM. I had included this LRT in my older RPub cited in the thread, but I stand corrected and agree with Rune Haubo that intLMM is not nested under zcpLMM. For example, in the new RPub, I show that slightly modified Machines data exhibit smaller deviance for intLMM than zcpLMM despite an additional model parameter in the latter. Thanks for the critical reading. (II) Here are Runo Haubo's sequences (left, resorted) augmented with my translation (right) (1) fm6 -> fm5 -> fm4 -> fm1 # maxLMM_RE1 -> fm5 -> intLMM -> minLMM (2) fm6 -> fm5 -> fm4 -> fm2 # maxLMM_RE1 -> fm5 -> intLMM -> fm2 (3) fm6 -> fm5 -> fm3 -> fm2 # maxLMM_RE1 -> fm5 -> zcpLMM_RE0 -> fm2 and here are sequences I came up with (left) augmented with translation into RH's fm's. (1) maxLMM_RE1 -> intLMM -> minLMM # fm6 -> fm4 -> fm1 (3) maxLMM_RE0 -> zcpLMM_RE0 # fm6 -> fm3 (4) maxLMM_RE1 -> zcpLMM_RE1 -> minLMM # fm6 -> fm7 -> fm1 (new sequence) (III) I have questions about fm2 and fm5. fm2: fm2 redefines the levels of the group factor (e.g., in the cake data there are 45 groups in fm2 compared to 15 in the other models). Why is fm2 nested under fm3 and fm6? Somehow it looks to me that you include an f:g interaction without the g main effect (relative to fm4). This looks like an interesting model; I would appreciate a bit more conceptual support for its interpretation in the model hierarchy. fm5: fm5 specifies 4 variance components (VCs), but the factor has only 3 levels. So to me this looks like there is redundancy built into the model. In support of this intuition, for the cake data, one of the VCs is estimated with 0. However, in the Machine data the model was not degenerate. So I am not sure. In other words, if the factor levels are A, B, C, and the two contrasts are c1 and c2, I thought I can specify either (1 + c1 + c2) or (0 + A + B + C). fm5 specifies (1 + A + B + C) which is rank deficient in the fixed effect part, but not necessarily in the random-effect term. What am I missing here? [1] https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026775.html [2] http://rpubs.com/Reinhold/391027 Best, Reinhold Kliegl On Thu, May 17, 2018 at 12:43 PM, Maarten Jung <Maarten.Jung at mailbox.tu-dresden.de> wrote:
Dear list,
When one wants to specify a lmer model including variance components
but no
correlation parameters for categorical predictors (factors) afaik one
has
to convert the factors to numeric covariates or use lme4::dummy().
Until
recently I thought m2a (or equivalently m2b using the double-bar
syntax)
would be the correct way to specify such a zero-correlation parameter
model.
But in this thread [1] Rune Haubo Bojesen Christensen pointed out that
this
model does not make sense to him. Instead he suggests m3 as an
appropriate
model.
I think this is a *highly relevant difference* for everyone who uses
factors in lmer and therefore I'm bringing up this issue again. But
maybe
I'm mistaken and just don't get what is quite obvious for more
experienced
mixed modelers.
Please note that the question is on CrossValidated [2] but some
consider it
as off-topic and I don't think there will be an answer any time soon.
So here are my questions:
How should one specify a lmm without correlation parameters for
factors and
what are the differences between m2a and m3?
Is there a preferred model for model comparison with m4 (this model is
also
discussed here [3])?
library("lme4")
data("Machines", package = "MEMSS")
d <- Machines
contrasts(d$Machine) # default coding: contr.sum
m1 <- lmer(score ~ Machine + (Machine | Worker), d)
c1 <- model.matrix(m1)[, 2]
c2 <- model.matrix(m1)[, 3]
m2a <- lmer(score ~ Machine + (1 | Worker) + (0 + c1 | Worker) + (0 +
c2 |
Worker), d)
m2b <- lmer(score ~ Machine + (c1 + c2 || Worker), d)
VarCorr(m2a)
Groups Name Std.Dev.
Worker (Intercept) 5.24354
Worker.1 c1 2.58446
Worker.2 c2 3.71504
Residual 0.96256
m3 <- lmer(score ~ Machine + (1 | Worker) + (0 + dummy(Machine, "A") |
Worker) +
(0 + dummy(Machine, "B") |
Worker) +
(0 + dummy(Machine, "C") |
Worker), d)
VarCorr(m3)
Groups Name Std.Dev.
Worker (Intercept) 3.78595
Worker.1 dummy(Machine, "A") 1.94032
Worker.2 dummy(Machine, "B") 5.87402
Worker.3 dummy(Machine, "C") 2.84547
Residual 0.96158
m4 <- lmer(score ~ Machine + (1 | Worker) + (1 | Worker:Machine), d)
[1]
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026775.html
[2] https://stats.stackexchange.com/q/345842/136579
[3] https://stats.stackexchange.com/q/304374/136579
Best regards,
Maarten
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models