Prev 5266 / 20628 Next

very basic HLM question

Sebastián Daza

Mon, Feb 7, 2011 10:04 AM

Thank you Douglas.
Do you know if there is a way to get the between-component variance (in 
my example equal to 59.004) through random effects Anova using R? I just 
want to get the ICC using the simple formula: ICC = variance between 
macro units / total variance.

Regards.

On 2/7/2011 11:44 AM, Douglas Bates wrote:

2011/2/7 Sebasti?n Daza<sebastian.daza at gmail.com>:

Hi everyone,
I need to get a between-component variance (e.g. random effects Anova) in
order to compute by hand the intraclass correlation. However, using lmer I
don't get the same results (variance component) than using random effects
Anova (with SPSS). I am using a database of students, clustered on schools
(there is not the same number of students by school).

I think the issue is that the estimates of the variance components are
different, which would then lead to different estimates of the ICC.

You mentioned in your original posting that the data are unbalanced
(i.e. there are different numbers of students in different schools).
The estimates returned from lmer are the REML estimates in this case.
Do you know how the other estimates are being calculated?  I'm not
sure that the naive calculation of within- and between- sums of
squares and equating expected mean squares with observed mean squares
works with unbalanced data.

First, I get the intraclass using ICC1 in R:

As Paul indicated, it is more informative to say, "using ICC1 from the
multilevel package for R".  ICC1 is not part of the base R nor the
required packages.

summary(anova1<- aov(math ~ as.factor(schoolid), data=nels88))
ICC1(anova1)

[1] 0.4414491

That is using the raw mean squares, not the estimates of the variance
components.  I have, thankfully, forgotten most of what I know about
expected and observed mean squares but I am pretty sure that those
don't correspond to the REML estimates, even in the balanced case.

Consider the results from the Dyestuff data, which is part of the lme4
package and is balanced

print(fm1<- lmer(Yield ~ 1|Batch, Dyestuff))

Linear mixed model fit by REML ['merMod']
Formula: Yield ~ 1 | Batch
    Data: Dyestuff
REML criterion at convergence: 319.6543

Random effects:
  Groups   Name        Variance Std.Dev.
  Batch    (Intercept) 1764     42.00
  Residual             2451     49.51
Number of obs: 30, groups: Batch, 6

Fixed effects:
             Estimate Std. Error t value
(Intercept)  1527.50      19.38    78.8

VarCorr(fm1)

$Batch
             (Intercept)
(Intercept)     1764.05
attr(,"stddev")
(Intercept)
     42.0006
attr(,"correlation")
             (Intercept)
(Intercept)           1

attr(,"sc")
[1] 49.5101

vc<- VarCorr(fm1)
var1hat<- as.vector(vc$Batch)
var1hat

[1] 1764.05

varhat<- attr(vc, "sc")^2
varhat

[1] 2451.25

var1hat/(var1hat + varhat)

[1] 0.4184874

anova1<- aov(Yield ~ 1|Batch, Dyestuff)

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
   0 (non-NA) cases
In addition: Warning message:
In Ops.factor(1, Batch) : | not meaningful for factors

anova1<- aov(Yield ~ Batch, Dyestuff)
summary(anova1)

             Df Sum Sq Mean Sq F value   Pr(>F)
Batch        5  56358 11271.5  4.5983 0.004398
Residuals   24  58830  2451.2

You can see that the estimates of the residual variance are the same
for the two model fits but the mean square for batch is not the
estimate of the between-batch variance.

ICC1 comes from the multilevel package. Using lmer I get:

mod1<- lmer(math ~ 1 + (1|schoolid), data=nels88)

Linear mixed model fit by REML
Formula: math ~ 1 + (1 | schoolid)
   Data: nels88
  AIC  BIC logLik deviance REMLdev
  1878 1888 -935.8     1875    1872
Random effects:
  Groups   Name        Variance Std.Dev.
  schoolid (Intercept) 34.011   5.8319
  Residual                72.256   8.5003
Number of obs: 260, groups: schoolid, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)   48.861      1.927   25.36

The random intercept effect is 34.011. If I compute the ICC manually
(according to some articles I have read) I get:

34.011/(34.011+72.256)

[1] 0.3200523

According to my Anova analysis, the between-component variance should be
59.004. If I use 59.004 the formula works relatively well:

  59.004/(59.004+72.256)
[1] 0.449520037

Not equal, but at least similar in comparison with ICC from ICC1 command.
.
Does anyone know why I have got those differences? How can I get the 59.004
figure using R? I should get practically an identical result than using ICC1
but I don't. I have used the database of that article and I have got an
identical result between ICCs. But with the dataset I am using (nels88)
doesn't work.

This is the example of the article mentioned above:

library(multilevel)
base(bh1996)

summary(lmer(WBEING ~ 1 + (1|GRP), data=bh1996))

0.035801/(0.035801+0.789497)
0.043379482

summary(test<- aov(WBEING~as.factor(GRP),data=bh1996))
ICC1(test)
[1] 0.04336905

Any suggestions?

--
Sebasti?n Daza
sebastian.daza at gmail.com

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Sebasti?n Daza
sebastian.daza at gmail.com

Thread (4 messages)

Sebastián Daza very basic HLM question Feb 7 Douglas Bates very basic HLM question Feb 7 Sebastián Daza very basic HLM question Feb 7 Douglas Bates very basic HLM question Feb 7