lme4 vs. HLM (program) - differences in estimation?

Tue, Nov 18, 2008 12:05 PM

Felix:

First off, you haven't group mean centered. With that said, I don't know why people do this anyhow, but I can see that you are not using a group mean centered variable.

Group mean centering means the SES variable for person i in school j would be the same for all i. But, in your variable, ses.gmc varies by student just as it does in the original variable ses. Your variable ses.gm is the group mean variable

In fact, what you would need for a group mean centered variable is:

hsb$ses.gmc <- ave(hsb$ses, hsb$id)

Which requires less work than your code, but gives the same result.

Now, the variable meanses in the HSB data is supposed to be the group mean centered variable. But, when you compare to my values using ave() you can see the meanses variable in the HSB data are wrong.

I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.

When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.

With that said, I have found GLMM results between HLM and lmer to match exactly even though HLM uses a 6th order taylor series expansion and R uses a laplace approximation with a 2nd order taylor series. I have also found estimates from linear models to match exactly even though HLM uses a different method of estimation than lmer as well.

-----Original Message-----
From: Felix Sch?nbrodt [mailto:nicebread at gmx.net] 
Sent: Tuesday, November 18, 2008 2:30 PM
To: Doran, Harold
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lme4 vs. HLM (program) - differences 
in estimation?

Thanks for your fast reply,

your right, I missed that in my explanation: Raudenbush et 
al. enter the socioeconomic status (ses) group centered into 
their model.
Therefore I calculated that group centered variable and 
called it ses.gmc (gmc = group mean centered; [I just 
recognize that this notion is ambiguous to "grand mean centered...])

# group-centering of ses (as R&B do ...) # ses.gm = group 
mean # ses.gmc = ses [group mean centered]

gMeans <- aggregate(HSB$ses, list(HSB$id), mean)
names(gMeans) <- c("id", "ses.gm")
HSB <- merge(HSB, gMeans, by="id")
HSB$ses.gmc <- HSB$ses - HSB$ses.gm

HSB.ML <- lmer(mathach ~ meanses + sector + ses.gmc + 
meanses*ses.gmc  
+ sector*ses.gmc + (ses.gmc|id), data=HSB)

I think despite that, we have the same dataset.
I used lme4_0.999375-27.

Felix

Am 18.11.2008 um 19:54 schrieb Doran, Harold:

Felix

I think I can help a bit. I need to know what ses.gmc is. I

have those

data and have run the following model:

lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)

My output below shows the data (your and mine) have the

same number of

obs. But, the original data distributed with HLM does not have a 
variable called ses.gmc. The only variables in the data are:

str(hsb)

'data.frame':   7185 obs. of  11 variables:
$ id      : Factor w/ 160 levels "1224","1288",..: 1 1 1 1

1 1 1 1 1

1 ...
$ minority: num  0 0 0 0 0 0 0 0 0 0 ...
$ female  : num  1 1 0 0 0 0 1 0 1 0 ...
$ ses     : num  -1.528 -0.588 -0.528 -0.668 -0.158 ...
$ mathach : num   5.88 19.71 20.35  8.78 17.90 ...
$ size    : num  842 842 842 842 842 842 842 842 842 842 ...
$ sector  : num  0 0 0 0 0 0 0 0 0 0 ...
$ pracad  : num  0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35

0.35 0.35 ...

$ disclim : num  1.60 1.60 1.60 1.60 1.60 ...
$ himinty : num  0 0 0 0 0 0 0 0 0 0 ...
$ meanses : num  -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428
-0.428 -0.428 -0.428 ...

Also you should report what version of lme4 you are using. Do a
sessionInfo() and this will appear.

lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)

Linear mixed model fit by REML
Formula: mathach ~ meanses * ses + sector * ses + (ses | id)
  Data: hsb
  AIC   BIC logLik deviance REMLdev
46525 46594 -23253    46498   46505
Random effects:
Groups   Name        Variance Std.Dev. Corr
id       (Intercept)  2.40744 1.55159
         ses          0.01462 0.12091  1.000
Residual             36.75838 6.06287
Number of obs: 7185, groups: id, 160

Fixed effects:
           Estimate Std. Error t value
(Intercept)  12.0948     0.2028   59.65
meanses       3.3202     0.3885    8.55
ses           2.9052     0.1483   19.59
sector        1.1949     0.3079    3.88
meanses:ses   0.8462     0.2718    3.11
ses:sector   -1.5781     0.2245   -7.03

Correlation of Fixed Effects:
           (Intr) meanss ses    sector mnss:s
meanses      0.213
ses          0.076 -0.146
sector      -0.676 -0.344 -0.064
meanses:ses -0.143  0.178  0.278 -0.081 ses:sector  -0.062 -0.080 
-0.679  0.093 -0.356

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On

Behalf Of Felix

Sch?nbrodt
Sent: Tuesday, November 18, 2008 12:02 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] lme4 vs. HLM (program) - differences in 
estimation?

Hi all,

in my workgroup the HLM program by Raudenbush et al. is

the de facto

standard - however, I'd like to do my mixed models in R using the 
lme4 package (as I do all other computations in R as well...)

Both as a practice and as an argument for my colleagues I tried to 
replicate (amongst others) the omnipresent
"math-achievement-in- catholic-vs.-public schools"-example

(using the

original HLM-data set) in lme4.

My first question is: is my formula in lmer the right one?

--> HLM-style model ( Y = MATHACH; ID = grouping factor)

Level-1 Model

	Y = B0 + B1*(SES) + R

Level-2 Model
	B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
	B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1

Combined Model:
	
	Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +
G12*(MEANSES)*(SES) + U0 + U1*(SES) + R

--> lmer-style:

	HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses + 
sector*ses + (ses.gmc | id), data=HSB)



All models yield the same results concerning fixed effects; models 
including random slopes however show some minor divergence in the 
random effects variance (not very much, but it is there ...; see 
sample outputs below). In the presented example, the variance 
component is 0.10 (lme4) vs. 0.15 (HLM).
I am aware that these differences are really small - in

this case the

difference was only 0.1% of the residual variance.

Nonetheless I would really be happy if someone could shed

some light

on this issue, so that I can go to my colleagues and say: "We get 
slightly different results _because_ [...], BUT this is not a 
problem, _because_ ..."


From my web and literature searches I have two guesses about these 
differences in both programs:

- a statement by Douglas Bates, that lme4 uses another estimation
algorithm:
"[...] but may be of interest to some who are familiar with a 
generalized least squares (GLS) representation of

mixed-models (such

as used in MLWin and HLM). The lme4 package uses a penalized least 
squares (PLS) representation instead."
(http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS
+page:1+mid:hb6p6mk6sztkvii2+state:results)

- HLM maybe does some Bayesian Estimation, what lme4 does

not do (?)

But: these only are guesses from my side ... I'm not a

statistician,

but would like to understand it (a bit)

All best,
Felix


#-----------------------------------
# OUTPUT FROM HLM
# ------------------------------------

 Final estimation of fixed effects:

--------------------------------------------------------------
--------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio
d.f.
P-value

--------------------------------------------------------------
--------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.096006   0.198734    60.865
157    0.000
      SECTOR, G01           1.226384   0.306271     4.004
157    0.000
     MEANSES, G02           5.333056   0.369161    14.446
157    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.937980   0.157140    18.697
157    0.000
      SECTOR, G11          -1.640951   0.242912    -6.755
157    0.000
     MEANSES, G12           1.034418   0.302574     3.419
157    0.001

--------------------------------------------------------------
--------------


 Final estimation of variance components:

--------------------------------------------------------------
---------------
 Random Effect           Standard      Variance     df
Chi-square
P-value
                         Deviation     Component

--------------------------------------------------------------
---------------
 INTRCPT1,       U0        1.54271       2.37996   157
605.29563    0.000
      SES slope, U1        0.38603       0.14902   157
162.30883    0.369
  level-1,       R         6.05831      36.70309

--------------------------------------------------------------
---------------



#-----------------------------------
# OUTPUT FROM LMER
#---------------------------------------

Linear mixed model fit by REML
Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc +
sector *      ses.gmc + (ses.gmc | id)
   Data: HSB
   AIC   BIC logLik deviance REMLdev
 46524 46592 -23252    46496   46504

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept)  2.379   1.543
          ses.gmc      0.101   0.318    0.391
 Residual             36.721   6.060
Number of obs: 7185, groups: id, 160

Fixed effects:
                Estimate Std. Error t value
(Intercept)     12.09600    0.19873  60.866
meanses          5.33291    0.36916  14.446
sector           1.22645    0.30627   4.005
ses.gmc          2.93877    0.15509  18.949
meanses:ses.gmc  1.03890    0.29889   3.476
sector:ses.gmc  -1.64261    0.23978  -6.850

___________________________________
Dipl. Psych. Felix Sch?nbrodt
Department of Psychology
Ludwig-Maximilian-Universit?t (LMU) Munich Leopoldstr. 13
D-80802 M?nchen
_______________________________________________
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

lme4 vs. HLM (program) - differences in estimation?

Thread (10 messages)