Skip to content

Understanding variance components

8 messages · Henrik Singmann, Phillip Alday, Gang Chen +1 more

#
Suppose that I have the following dataset in R:

library(lme4)
data(Machines,package="nlme")
mydata <- Machines[Machines$Machine!='C?,]

With the following model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")

I have the variance components as shown below:

Random effects:
 Groups         Name        Variance
 Machine:Worker (Intercept) 46.00   
 Worker         (Intercept) 13.84   
 Residual                    1.16   

I have trouble understanding exactly what the first two components are: Machine:Worker and Worker? Specifically,

1) What is the variance for Worker: corresponding to the base (or reference) level of the factor Machine? If so, what is the base level: the first level in the dataset or alphabetically the first level (it happens to be the same in this particular dataset)?

2) What is the variance for Machine:Worker? Is it the variance for the second level of the factor Machine, or the extra variance relative to the variance for Worker?

Furthermore, for the model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines), ranef.comp="Var")

what is the variance for Machine:Worker in the following result since there are 3 levels involved in the factor Machine?

Random effects:
 Groups         Name        Variance
 Machine:Worker (Intercept) 60.2972 
 Worker         (Intercept)  7.3959 
 Residual                    0.9246 

Thanks,
Gang
8 days later
#
Hi Gang,

Sorry that I so am late to the party, but in case you are still 
interested I will reply (and, of course, for the archive).

The answer is basically given in the old faq:
http://glmm.wikidot.com/faq#toc27

(1|site/block) = (1|site)+(1|site:block)

Which is exactly what is given in your output. A random intercept for 
Worker and a random intercept for each worker:Machine interaction.

To answer your questions. The random intercepts do not have base or 
reference levels. They are increments or decrements to the overall 
intercept for each level of Worker or the Machine:Worker combination. 
The reported variance is the estimated variance of these increments, 
which is most likely unequal to the actual variance you would obtain by 
calculating it from the estimated increments, which are sometimes called 
BLUPs (I wonder if a better term for those exist).

Hope that helps,
Henrik

PS: Belated Happy New Year to everyone.


Am 05.01.2017 um 17:28 schrieb Chen, Gang (NIH/NIMH) [C]:
#
On Fri, 2017-01-13 at 18:06 +0100, Henrik Singmann wrote:
:snip:
I've seen them called "conditional modes" (in a vaguely Bayesian sense)
at some point by Doug Bates because of course they're not linear for
the GLMM case.

Best,
Phillip
4 days later
#
Happy New Year, Henrik! Thanks for explaining the details. A couple of days after I posted the question, I realized that my question was silly! Once I laid out the LME model equation, my original confusion was resolved.

Actually I meant to ask a slightly different question. Let me use the dataset embedded in your ?afex? package as an example:

data(obk.long, package = "afex?)

Suppose that my base model is

lmer(value ~ gender*phase+(1|id), data=obk.long)

Is there a way to specify a different variance for each gender in one model?

Thanks,
Gang
#
Hi Gang,

I have an idea which is based on the last example given on ?lmer:
## Fit sex-specific variances by constructing numeric dummy variables
...

I am not sure if this is entirely correct, but it looks good to me. If 
not, hopefully someone more knowledgeable will jump in.

## Original model without gender specific variance:
data(obk.long, package = "afex")

m1 <- lmer(value ~ gender*phase+(1|id), data=obk.long)
summary(m1)$varcor
## Groups   Name        Std.Dev.
## id       (Intercept) 1.6018
## Residual             1.4820
REMLcrit(m1)
## [1] 911.1599

## to get gender specific vari8ances, we construct two dummy variables:
obk.long$gender_F <- as.numeric(obk.long$gender == "F")
obk.long$gender_M <- as.numeric(obk.long$gender == "M")
m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), 
data=obk.long)
summary(m2)$varcor
## Groups   Name     Std.Dev.
## id       gender_F 0.99723
## id.1     gender_M 2.03404
## Residual          1.48196
REMLcrit(m2)
## [1] 908.297

So far, looks reasonably close. Same for the conditional modes (thx 
Phillip). Left two columns is separate, right is joint variance.
cbind(ranef(m2)$id, rep(NA, 16), ranef(m1)$id)
##      gender_F   gender_M rep(NA, 16) (Intercept)
## 1   0.0000000 -3.0986753          NA  -3.0351426
## 2   0.0000000 -2.1328544          NA  -2.0891242
## 3   0.0000000  0.1207276          NA   0.1182523
## 4  -0.9806229  0.0000000          NA  -1.0642708
## 5  -0.3995130  0.0000000          NA  -0.4335918
## 6   0.0000000  2.6962499          NA   2.6409683
## 7   0.0000000  1.4084888          NA   1.3796103
## 8  -0.3995130  0.0000000          NA  -0.4335918
## 9  -0.6900680  0.0000000          NA  -0.7489313
## 10  0.0000000  0.4426679          NA   0.4335918
## 11  0.0000000 -1.1670336          NA  -1.1431057
## 12  0.0000000  1.7304291          NA   1.6949498
## 13  1.3438166  0.0000000          NA   1.4584452
## 14 -0.6900680  0.0000000          NA  -0.7489313
## 15  0.4721518  0.0000000          NA   0.5124267
## 16  1.3438166  0.0000000          NA   1.4584452

And, finally, the same for the fixed effects:
fixef(m1)
##       (Intercept)           genderM         phasepost
##      6.000000e+00      7.500000e-01     -6.250000e-01
##          phasepre genderM:phasepost  genderM:phasepre
##     -2.000000e+00     -1.243450e-15     -1.687539e-15
fixef(m2)
##      (Intercept)           genderM         phasepost
##     6.000000e+00      7.500000e-01     -6.250000e-01
##         phasepre genderM:phasepost  genderM:phasepre
##    -2.000000e+00      1.829648e-14      1.820766e-14

Hope that helps,
Henrik


Am 18.01.2017 um 23:18 schrieb Chen, Gang (NIH/NIMH) [C]:
#
Nice, Henrik!

One thing we need to resolve, though, is this.

====================
The variances for model m1:

(1) intercept

summary(m1)$varcor$id[1]
[1] 2.565887

(2) residuals

attr(summary(m1)$varcor, "sc")^2
[1] 2.196212

====================
And the variances for model m2:

(1) female

summary(m2)$varcor$id[1]
[1] 0.9944589

(2) male

summary(m2)$varcor$id.1[1]
[1] 4.137316

(3) residuals

attr(summary(m2)$varcor, "sc")^2
[1] 2.196212
====================

It?s great to see that the residual variance matches between the two models m1 and m2. However, the intercept variance in m1, 2.565887, is not equal to the sum of female and male variances, 0.9944589 + 4.137316 = 5.131775. However, if we divide the total variance of female and male by 2, we have (0.9944589 + 4.137316)/2 = 2.565887. Why is that?

If we code the two groups as

obk.long$gender_F <- sqrt(2)*as.numeric(obk.long$gender == "F")
obk.long$gender_M <- sqrt(2)*as.numeric(obk.long$gender == "M?)

then we have the desired result,

m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor$id[1]+summary(m2)$varcor$id.1[1]
[1] 2.565888

Even though the variance part is reconciled, I cannot come up with a good explanation as to why this coding strategy is required. Any thought?

Thanks,
Gang
On Jan 19, 2017, at 6:13 AM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>> wrote:
Hi Gang,

I have an idea which is based on the last example given on ?lmer:
## Fit sex-specific variances by constructing numeric dummy variables
...

I am not sure if this is entirely correct, but it looks good to me. If not, hopefully someone more knowledgeable will jump in.

## Original model without gender specific variance:
data(obk.long, package = "afex")

m1 <- lmer(value ~ gender*phase+(1|id), data=obk.long)
summary(m1)$varcor
## Groups   Name        Std.Dev.
## id       (Intercept) 1.6018
## Residual             1.4820
REMLcrit(m1)
## [1] 911.1599

## to get gender specific vari8ances, we construct two dummy variables:
obk.long$gender_F <- as.numeric(obk.long$gender == "F")
obk.long$gender_M <- as.numeric(obk.long$gender == "M")
m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor
## Groups   Name     Std.Dev.
## id       gender_F 0.99723
## id.1     gender_M 2.03404
## Residual          1.48196
REMLcrit(m2)
## [1] 908.297

So far, looks reasonably close. Same for the conditional modes (thx Phillip). Left two columns is separate, right is joint variance.
cbind(ranef(m2)$id, rep(NA, 16), ranef(m1)$id)
##      gender_F   gender_M rep(NA, 16) (Intercept)
## 1   0.0000000 -3.0986753          NA  -3.0351426
## 2   0.0000000 -2.1328544          NA  -2.0891242
## 3   0.0000000  0.1207276          NA   0.1182523
## 4  -0.9806229  0.0000000          NA  -1.0642708
## 5  -0.3995130  0.0000000          NA  -0.4335918
## 6   0.0000000  2.6962499          NA   2.6409683
## 7   0.0000000  1.4084888          NA   1.3796103
## 8  -0.3995130  0.0000000          NA  -0.4335918
## 9  -0.6900680  0.0000000          NA  -0.7489313
## 10  0.0000000  0.4426679          NA   0.4335918
## 11  0.0000000 -1.1670336          NA  -1.1431057
## 12  0.0000000  1.7304291          NA   1.6949498
## 13  1.3438166  0.0000000          NA   1.4584452
## 14 -0.6900680  0.0000000          NA  -0.7489313
## 15  0.4721518  0.0000000          NA   0.5124267
## 16  1.3438166  0.0000000          NA   1.4584452

And, finally, the same for the fixed effects:
fixef(m1)
##       (Intercept)           genderM         phasepost
##      6.000000e+00      7.500000e-01     -6.250000e-01
##          phasepre genderM:phasepost  genderM:phasepre
##     -2.000000e+00     -1.243450e-15     -1.687539e-15
fixef(m2)
##      (Intercept)           genderM         phasepost
##     6.000000e+00      7.500000e-01     -6.250000e-01
##         phasepre genderM:phasepost  genderM:phasepre
##    -2.000000e+00      1.829648e-14      1.820766e-14

Hope that helps,
Henrik
On Jan 18, 2017, at 5:18 PM, Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>> wrote:
Happy New Year, Henrik! Thanks for explaining the details. A couple of days after I posted the question, I realized that my question was silly! Once I laid out the LME model equation, my original confusion was resolved.

Actually I meant to ask a slightly different question. Let me use the dataset embedded in your ?afex? package as an example:

data(obk.long, package = "afex?)

Suppose that my base model is

lmer(value ~ gender*phase+(1|id), data=obk.long)

Is there a way to specify a different variance for each gender in one model?

Thanks,
Gang
On Jan 13, 2017, at 12:06 PM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>> wrote:
Hi Gang,

Sorry that I so am late to the party, but in case you are still interested I will reply (and, of course, for the archive).

The answer is basically given in the old faq:
http://glmm.wikidot.com/faq#toc27

(1|site/block) = (1|site)+(1|site:block)

Which is exactly what is given in your output. A random intercept for Worker and a random intercept for each worker:Machine interaction.

To answer your questions. The random intercepts do not have base or reference levels. They are increments or decrements to the overall intercept for each level of Worker or the Machine:Worker combination. The reported variance is the estimated variance of these increments, which is most likely unequal to the actual variance you would obtain by calculating it from the estimated increments, which are sometimes called BLUPs (I wonder if a better term for those exist).

Hope that helps,
Henrik

PS: Belated Happy New Year to everyone.


Am 05.01.2017 um 17:28 schrieb Chen, Gang (NIH/NIMH) [C]:
Suppose that I have the following dataset in R:

library(lme4)
data(Machines,package="nlme")
mydata <- Machines[Machines$Machine!='C?,]

With the following model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")

I have the variance components as shown below:

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 46.00
Worker         (Intercept) 13.84
Residual                    1.16

I have trouble understanding exactly what the first two components are: Machine:Worker and Worker? Specifically,

1) What is the variance for Worker: corresponding to the base (or reference) level of the factor Machine? If so, what is the base level: the first level in the dataset or alphabetically the first level (it happens to be the same in this particular dataset)?

2) What is the variance for Machine:Worker? Is it the variance for the second level of the factor Machine, or the extra variance relative to the variance for Worker?

Furthermore, for the model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines), ranef.comp="Var")

what is the variance for Machine:Worker in the following result since there are 3 levels involved in the factor Machine?

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 60.2972
Worker         (Intercept)  7.3959
Residual                    0.9246

Thanks,
Gang
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Hey Gang, hey Henrik (!) :)

very insightful query, hope it is okay that I briefly join it, on the last
question.
Variance is a result of division of squared deviations by sample size. In
the complete sample, this means divided by 240; and in the gender samples
by 120 each.
Have a look on the reversed equations for obtaining the overall variance by
combining gender groups, weighted by sample size:

(120 * 0.9944589   +   4.137316 * 120)/240 = overall variance (should be
equal to)  = (0.9944589 + 4.137316)/2   acording to your equation gang
left hand can be written as:
120 * (0.9944589   +   4.137316) / 240
which is:
(0.9944589   +   4.137316)/2

so nothing to worry, I think :)

Best wishes,
Ren?



2017-01-19 16:14 GMT+01:00 Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov>
:

  
  
3 days later
#
Thanks Ren? for the comment. I'm still puzzled by the fact that the variance decomposition cannot seem to be directly reconciled through the two models themselves, and I hope that someone can offer a better way to interpret this.

Gang