Another question about model specification
On Tue, Mar 4, 2008 at 11:46 AM, Martin Eklund
<martin.eklund at farmbio.uu.se> wrote:
Hello again,
Thank you very much for the quick reply on my earlier question! Unfortunately, I still have problems to fully understand how to specify models in lme4. Specifically, how do you specify interactions between factors?
Consider the example data below (also attached for convenience). 'treatment' and 'family' are factors, 'treatment' is fixed and 'family' is random. Fitting the models
(i) lmer(weight ~ treatment + (1|family)) (ii) lmer(weight ~ treatment * (1|family)) (iii) lmer(weight ~ (treatment + (1|family))^2) (iv) lmer(weight ~ treatment : (1|family))
produce exactly the same results. I assume that this is the expected behavior, but to me it is a bit confusing.
I started writing a kind of a grumpy response to this inquiry because the documentation for lmer specifically states that terms in the formula are separated by '+' symbols, so exploring the possible effects of other types of operators is not likely to be productive. Nevertheless I will concede that specifying an interaction between a fixed-effects factor and a random factor is not a transparent operation. There are two ways that such an interaction can be specified. First, an interaction between a fixed factor and a random factor is a random effect. That is, randomness propagates through interactions. To specify an interaction between a fixed factor and a random factor you must consider not only the form of the model matrix that results but also the form of the variance-covariance matrix for the resulting random effects. You can specify the interaction as lmer(weight ~ treatment + (1| family) + (1|family:treatment)) or as lmer(weight ~ treatment + (treatment | family)) In the first specification you will estimate two variance components, one for family and one for the family:treatment interaction. This assumes that the random effects for each level of the treatment within a family are independent and have the same variance. In the second specification you estimate a vector of random effects for each family where the length of the vector is the number of treatments. Let's say there are m treatments. Then you will estimate a full m by m variance-covariance matrix for the random effects. This becomes problematic when m is large. That is, the first model is simpler than the second - in fact, it is a submodel of the second model. Here is an example using the (probably fictitious) Machines data from the MEMSS package.
data(Machines, package = "MEMSS") str(Machines)
'data.frame': 54 obs. of 3 variables: $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ... $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
(Mm1 <- lmer(score ~ Machine + (1|Worker) + (1|Worker:Machine), Machines))
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker) + (1 | Worker:Machine)
Data: Machines
AIC BIC logLik deviance REMLdev
227.7 239.6 -107.8 225.5 215.7
Random effects:
Groups Name Variance Std.Dev.
Worker (Intercept) 22.85892 4.78110
Worker:Machine (Intercept) 13.90845 3.72940
Residual 0.92465 0.96159
Number of obs: 54, groups: Worker, 6; Worker:Machine, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 52.356 2.486 21.062
MachineB 7.967 2.177 3.660
MachineC 13.917 2.177 6.393
Correlation of Fixed Effects:
(Intr) MachnB
MachineB -0.438
MachineC -0.438 0.500
(Mm2 <- lmer(score ~ Machine + (Machine|Worker), Machines))
Linear mixed model fit by REML
Formula: score ~ Machine + (Machine | Worker)
Data: Machines
AIC BIC logLik deviance REMLdev
228.3 248.2 -104.2 216.6 208.3
Random effects:
Groups Name Variance Std.Dev. Corr
Worker (Intercept) 16.64051 4.07928
MachineB 34.54670 5.87764 0.484
MachineC 13.61398 3.68971 -0.365 0.297
Residual 0.92463 0.96158
Number of obs: 54, groups: Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 52.356 1.681 31.151
MachineB 7.967 2.421 3.291
MachineC 13.917 1.540 9.037
Correlation of Fixed Effects:
(Intr) MachnB
MachineB 0.463
MachineC -0.374 0.301
anova(Mm2, Mm1)
Data: Machines
Models:
Mm1: score ~ Machine + (1 | Worker) + (1 | Worker:Machine)
Mm2: score ~ Machine + (Machine | Worker)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
Mm1.p 6 237.47 249.40 -112.73
Mm2.p 10 236.61 256.50 -108.31 8.8515 4 0.06492
If we pretend that 'family' is a fixed factor and use lm to fit the models above we get different results (or, to be precise, (ii) = (iii) ? (i) ? (iv)), which is the expected behavior to me. So, my question really boils down to: if I want to fit the model weight ~ treatment + (1|family) + (an interaction term between treatment and (1|family)) how would I specify that? And in general, how are interactions specified? Thank you! Best regards, Martin. ========== R commands ========== data <- read.delim(file="/path/to/test.txt", sep="\t", header=TRUE) data$treatment <- as.factor(data$treatment) data$family <- as.factor(data$family) attach(data) lmer(weight ~ treatment + (1|family)) lmer(weight ~ treatment * (1|family)) lmer(weight ~ (treatment + (1|family))^2) lmer(weight ~ treatment : (1|family)) =========== Example data =========== treatment family weight 2 1 0.099 3 1 0.099 1 1 0.105 1 1 0.112 1 1 0.113 1 1 0.114 1 1 0.119 3 1 0.121 1 1 0.123 1 1 0.124 3 1 0.130 3 1 0.130 2 1 0.142 2 1 0.143 2 1 0.144 2 1 0.164 2 2 0.086 1 2 0.097 1 2 0.097 1 2 0.105 1 2 0.106 2 2 0.107 1 2 0.109 2 2 0.113 2 2 0.114 1 2 0.114 3 2 0.115 2 2 0.132 3 2 0.141 2 2 0.131 ? ======================================== Martin Eklund PhD Student Department of Pharmaceutical Biosciences Uppsala University, Sweden Ph: +46-18-4714281 ========================================
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models