Skip to content

lme4 vs. HLM (program) - differences in estimation?

10 messages · Felix Schönbrodt, Douglas Bates, Doran, Harold

#
Hi all,

in my workgroup the HLM program by Raudenbush et al. is the de facto  
standard - however, I'd like to do my mixed models in R using the lme4  
package (as I do all other computations in R as well...)

Both as a practice and as an argument for my colleagues I tried to  
replicate (amongst others) the omnipresent "math-achievement-in- 
catholic-vs.-public schools"-example (using the original HLM-data set)  
in lme4.

My first question is: is my formula in lmer the right one?

--> HLM-style model ( Y = MATHACH; ID = grouping factor)

Level-1 Model

	Y = B0 + B1*(SES) + R

Level-2 Model
	B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
	B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1

Combined Model:
	
	Y = G00 + G01*(SECTOR) + G02*(MEANSES) + G11*(SECTOR)*(SES) +  
G12*(MEANSES)*(SES) + U0 + U1*(SES) + R

--> lmer-style:

	HSB.ML <- lmer(mathach ~ sector + meanses + ses + meanses*ses +  
sector*ses + (ses.gmc | id), data=HSB)



All models yield the same results concerning fixed effects; models  
including random slopes however show some minor divergence in the  
random effects variance (not very much, but it is there ...; see  
sample outputs below). In the presented example, the variance  
component is 0.10 (lme4) vs. 0.15 (HLM).
I am aware that these differences are really small - in this case the  
difference was only 0.1% of the residual variance.

Nonetheless I would really be happy if someone could shed some light  
on this issue, so that I can go to my colleagues and say: "We get  
slightly different results _because_ [...], BUT this is not a problem,  
_because_ ..."


 From my web and literature searches I have two guesses about these  
differences in both programs:

- a statement by Douglas Bates, that lme4 uses another estimation  
algorithm:
"[...] but may be of interest to some who are familiar with a  
generalized least squares (GLS) representation of mixed-models (such  
as used in MLWin and HLM). The lme4 package uses a penalized least  
squares (PLS) representation instead."
(http://markmail.org/search/?q=lme4+PLS+GLS#query:lme4%20PLS%20GLS 
+page:1+mid:hb6p6mk6sztkvii2+state:results)

- HLM maybe does some Bayesian Estimation, what lme4 does not do (?)

But: these only are guesses from my side ... I'm not a statistician,  
but would like to understand it (a bit)

All best,
Felix


#-----------------------------------
# OUTPUT FROM HLM
# ------------------------------------

  Final estimation of fixed effects:
   
----------------------------------------------------------------------------
                                        Standard             Approx.
     Fixed Effect         Coefficient   Error      T-ratio   d.f.      
P-value
   
----------------------------------------------------------------------------
  For       INTRCPT1, B0
     INTRCPT2, G00          12.096006   0.198734    60.865        
157    0.000
       SECTOR, G01           1.226384   0.306271     4.004        
157    0.000
      MEANSES, G02           5.333056   0.369161    14.446        
157    0.000
  For      SES slope, B1
     INTRCPT2, G10           2.937980   0.157140    18.697        
157    0.000
       SECTOR, G11          -1.640951   0.242912    -6.755        
157    0.000
      MEANSES, G12           1.034418   0.302574     3.419        
157    0.001
   
----------------------------------------------------------------------------


  Final estimation of variance components:
   
-----------------------------------------------------------------------------
  Random Effect           Standard      Variance     df    Chi-square   
P-value
                          Deviation     Component
   
-----------------------------------------------------------------------------
  INTRCPT1,       U0        1.54271       2.37996   157      
605.29563    0.000
       SES slope, U1        0.38603       0.14902   157      
162.30883    0.369
   level-1,       R         6.05831      36.70309
   
-----------------------------------------------------------------------------



#-----------------------------------
# OUTPUT FROM LMER
#---------------------------------------

Linear mixed model fit by REML
Formula: mathach ~ meanses + sector + ses.gmc + meanses * ses.gmc +  
sector *      ses.gmc + (ses.gmc | id)
    Data: HSB
    AIC   BIC logLik deviance REMLdev
  46524 46592 -23252    46496   46504

Random effects:
  Groups   Name        Variance Std.Dev. Corr
  id       (Intercept)  2.379   1.543
           ses.gmc      0.101   0.318    0.391
  Residual             36.721   6.060
Number of obs: 7185, groups: id, 160

Fixed effects:
                 Estimate Std. Error t value
(Intercept)     12.09600    0.19873  60.866
meanses          5.33291    0.36916  14.446
sector           1.22645    0.30627   4.005
ses.gmc          2.93877    0.15509  18.949
meanses:ses.gmc  1.03890    0.29889   3.476
sector:ses.gmc  -1.64261    0.23978  -6.850


___________________________________
Dipl. Psych. Felix Sch?nbrodt
Department of Psychology
Ludwig-Maximilian-Universit?t (LMU) Munich
Leopoldstr. 13
D-80802 M?nchen
#
Felix

I think I can help a bit. I need to know what ses.gmc is. I have those data and have run the following model:

lmer(mathach ~ meanses * ses + sector * ses + (ses | id), hsb)

My output below shows the data (your and mine) have the same number of obs. But, the original data distributed with HLM does not have a variable called ses.gmc. The only variables in the data are:
'data.frame':   7185 obs. of  11 variables:
 $ id      : Factor w/ 160 levels "1224","1288",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ minority: num  0 0 0 0 0 0 0 0 0 0 ...
 $ female  : num  1 1 0 0 0 0 1 0 1 0 ...
 $ ses     : num  -1.528 -0.588 -0.528 -0.668 -0.158 ...
 $ mathach : num   5.88 19.71 20.35  8.78 17.90 ...
 $ size    : num  842 842 842 842 842 842 842 842 842 842 ...
 $ sector  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ pracad  : num  0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 ...
 $ disclim : num  1.60 1.60 1.60 1.60 1.60 ...
 $ himinty : num  0 0 0 0 0 0 0 0 0 0 ...
 $ meanses : num  -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 -0.428 ...

Also you should report what version of lme4 you are using. Do a sessionInfo() and this will appear.
Linear mixed model fit by REML 
Formula: mathach ~ meanses * ses + sector * ses + (ses | id) 
   Data: hsb 
   AIC   BIC logLik deviance REMLdev
 46525 46594 -23253    46498   46505
Random effects:
 Groups   Name        Variance Std.Dev. Corr  
 id       (Intercept)  2.40744 1.55159        
          ses          0.01462 0.12091  1.000 
 Residual             36.75838 6.06287        
Number of obs: 7185, groups: id, 160

Fixed effects:
            Estimate Std. Error t value
(Intercept)  12.0948     0.2028   59.65
meanses       3.3202     0.3885    8.55
ses           2.9052     0.1483   19.59
sector        1.1949     0.3079    3.88
meanses:ses   0.8462     0.2718    3.11
ses:sector   -1.5781     0.2245   -7.03

Correlation of Fixed Effects:
            (Intr) meanss ses    sector mnss:s
meanses      0.213                            
ses          0.076 -0.146                     
sector      -0.676 -0.344 -0.064              
meanses:ses -0.143  0.178  0.278 -0.081       
ses:sector  -0.062 -0.080 -0.679  0.093 -0.356
#
Thanks for your fast reply,

your right, I missed that in my explanation: Raudenbush et al. enter  
the socioeconomic status (ses) group centered into their model.
Therefore I calculated that group centered variable and called it  
ses.gmc (gmc = group mean centered; [I just recognize that this notion  
is ambiguous to "grand mean centered...])

# group-centering of ses (as R&B do ...)
# ses.gm = group mean
# ses.gmc = ses [group mean centered]

gMeans <- aggregate(HSB$ses, list(HSB$id), mean)
names(gMeans) <- c("id", "ses.gm")
HSB <- merge(HSB, gMeans, by="id")
HSB$ses.gmc <- HSB$ses - HSB$ses.gm

HSB.ML <- lmer(mathach ~ meanses + sector + ses.gmc + meanses*ses.gmc  
+ sector*ses.gmc + (ses.gmc|id), data=HSB)

I think despite that, we have the same dataset.
I used lme4_0.999375-27.

Felix




Am 18.11.2008 um 19:54 schrieb Doran, Harold:
#
Felix:

First off, you haven't group mean centered. With that said, I don't know why people do this anyhow, but I can see that you are not using a group mean centered variable. 

Group mean centering means the SES variable for person i in school j would be the same for all i. But, in your variable, ses.gmc varies by student just as it does in the original variable ses. Your variable ses.gm is the group mean variable

In fact, what you would need for a group mean centered variable is:

hsb$ses.gmc <- ave(hsb$ses, hsb$id)

Which requires less work than your code, but gives the same result.

Now, the variable meanses in the HSB data is supposed to be the group mean centered variable. But, when you compare to my values using ave() you can see the meanses variable in the HSB data are wrong.

I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.

When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.

With that said, I have found GLMM results between HLM and lmer to match exactly even though HLM uses a 6th order taylor series expansion and R uses a laplace approximation with a 2nd order taylor series. I have also found estimates from linear models to match exactly even though HLM uses a different method of estimation than lmer as well.
#
On Tue, Nov 18, 2008 at 11:02 AM, Felix Sch?nbrodt <nicebread at gmx.net> wrote:
That formula is not incorrect but it is a bit redundant.  In the R
formula notation the ':' operator creates an interaction and the '*'
operator is used to cross effects.  Thus meanses*ses expands to
meanses + ses + meanses:ses
Does the HLM specification allow for correlation of the random effects
within ID group?  The lmer specification does.
The criterion, either the log-likelihood for ML estimates or the REML
criterion for REML estimates, is defined as a property of the data and
the model.  The issue I was discussing there is how to evaluate the
log-likelihood or the REML criterion given the data, the model and
values of the parameters.  Solving a penalized least squares problem
or a generalized least squares problem is just a step in the
evaluation.  Both paths should give the same answer.  The reasons I
prefer the penalized least squares approach have to do with accuracy,
reliability and speed as well as generality of the approach.

I think it will be more important to check that the model
specifications are the same and that you are using the same criterion.
 The default criterion for lmer is REML.  I don't know what the
default criterion for HLM is.
I'm not sure what that would mean.  To a statistician "Bayesian
Estimation" would be associated with Bayesian representations of
models in which the parameters are regarded as random variables.  The
estimation criterion would change from the likelihood (or a related
criterion like the REML criterion) to maximizing the "posterior
density" of the parameters.  Because the posterior density depends on
the likelihood and on the prior density you must specify both the
probability model and prior densities to be able to define the
estimates.

At least that is the statistician's view of Bayesian estimation.  In
some fields, like machine learning, the adjective Bayesian is applied
to any algorithm that seems remotely related to a probability model.
For example, if you check how movie ratings are calculated at imdb.com
they use what they term is a Bayesian algorithm which, in their case,
means that they use a weighted average of the actual votes for the
movie and a typical vote so that a movie that is highly rated by very
few people doesn't suddenly become their number 1 rated movie of all
time.  Just saying it is Bayesian doesn't define the answer.  You need
to specify the probability model.

I would check two things: does HLM estimate two variances and a
covariance for the random effects and is it using the REML criterion
or the ML criterion.
#
On Tue, Nov 18, 2008 at 2:05 PM, Doran, Harold <HDoran at air.org> wrote:
I'm not sure about that.  I believe that Felix is correct that the
"centered" part refers to the difference between the student's ses
score and some standard value and the "group mean centered" would
refer to using the observed mean ses for each group as the centering
value.  The variable ses.gmc would vary with person i in school j but
it would have the property that the values of ses.gmc summed across
all the students in school j would be zero.

Having said that, I would agree with you that there is more emphasis
on centering of variables in the HLM literature than I feel is
warranted.  Centering is related to parameterization and most of the
time changing parameters does not change the model.  One set of
parameters may be more convenient than another or provide a better
conditioned estimation situation but usually the model is not changed.
I recall getting a result like that when I first looked at those data.
 I spent some time trying to reproduce the HLM results but eventually
gave up because of anomalies like this. The code for lmer is available
for anyone who wants to check exactly what is going on and the
evaluation of the log-likelihood and the REML criterion is described
in some detail - probably more detail than most people would want to
know - in one of the vignettes
#
You're right. I do this so rarely I had forgotten. I referred to my HLM
book and they note that a group mean centered variable is

x_{ij} - \bar{X_{.j}}

Where \bar{X_{.j}} is the mean of the variable x for individuals in
group j. Now, I still suspect a data difference. My suspicion is that
HLM is doing the group mean centering using the variable meanses, which
is not the same group mean variable as Felix computes in R, which is
done properly.
#
Felix:
 
Just a thought. There is an implicit assumption in this thread that the lmer function is useful or correct because it lines up with HLM results. Now, I don't think looking to such comparisons is a bad idea, per se, especially when one is doing software development and maybe some unit testing is in order. Or, if one is very familiar with HLM and wants to learn to use lmer by comparing HLM output with the new results obtained under lmer.

However, I would offer caution that lmer isn't good because it aligns with HLM. In fact, the computational methods implemented for lmer far exceed the computational methods of almost all other programs designed for linear mixed effects (or generalized linear) models. 

Lmer also lives inside a very nice programming environment (R) that allows for you to manipulate data and run the model all in one place, so there are significant ease of use issues.

I don't think you are making this assumption necessarily. But, it may be useful for you to outline for your colleagues criteria for software evaluation and line HLM and lmer up next to each other based on these criteria. 

HLM is a useful program, but I would argue that the lmer function is much more capable of handling some complex issues.

Take care,
Harold


________________________________

	From: Felix Sch?nbrodt [mailto:nicebread at gmx.net] 
	Sent: Tuesday, November 18, 2008 4:52 PM
	To: Douglas Bates
	Cc: Doran, Harold; r-sig-mixed-models at r-project.org
	Subject: Re: [R-sig-ME] lme4 vs. HLM (program) - differences in estimation?
	
	
	Thanks very much for your thorough remarks! 



			I suspect you are not using the same data between HLM and R and that may be the problem. That is, in R you create a variable called ses.gmc thinking this is a group mean centered variable. But, HLM "automagically" does group mean centering for you if you ask it to.
			


			When you work in HLM are you using the exact same data file that you created for your use in R? Or, are you relying on HLM to group mean center for you? If so, I suspect that is the issue. In fact, we can see that the difference in your estimates lies in this variable and this is the variable you manipulate in R, so I suspect the error may no be a function of estimation differences, but of data differences.

		
		


	I followed your suggestion about the possibly different centering approach in HLM and did a run only with raw variables (no centering). The small differences stay.


		I would check two things: does HLM estimate two variances and a
		covariance for the random effects and is it using the REML criterion
		or the ML criterion.


	HLM is using the REML criterion; concerning the variances for random effects I couldn't dig deep enough till now to answer this question ...

	From a lot of comparisons I have run now, I can conclude the following:
	- fixed effects usually are equal up to 3 digits after the decimal point
	- random variance components can show small deviations if they are very close to zero (and are not siginificant anyway - following the p-value reported by HLM, which is usually > 0.50 in these cases). However, I am aware of the discussion concerning the appropriateness of p-values reported by HLM (see also http://markmail.org/search/?q=lme4+p-value#query:lme4%20p-value+page:1+mid:3t4hxxrlh3uvb7kh+state:results).

	Maybe I was too nitpicking about these small differences - they only seem to appear in coefficients/ variance components that are insignificant anyway. But the discussion was definitely productive for me, and I think I now can convince my colleagues to use lme4 instead ;-)
	Maybe someoner wants to dig deeper into this issue; the data set from the HLM program (HS&B data) can be downloaded here for example: http://www.hlm-online.com/datasets/HSBDataset/HSBdata.zip.


	Thanks,
	Felix
#
On Tue, Nov 18, 2008 at 3:51 PM, Felix Sch?nbrodt <nicebread at gmx.net> wrote:
That's an interesting point about the biggest apparent differences
being in the small variance estimates.  It has been a while since I
looked at descriptions of the computational methods in HLM but I
believe that they work in terms of the precision of the random effects
(the inverse of the variance) instead of the variance.  In the way
that the model was typically expressed when that software was written
it is easier to work with the precision matrix than with the variance
matrix of the random effects.

When you look at the situation in more detail, however, you find that
it is possible to have estimates of zero for the variances of the
random effects whereas you cannot have an estimate of zero for the
precision, corresponding to infinite variance.  Optimizing the
log-likelihood with respect to the variance parameters (or, better,
the standard deviations) is more stable than optimizing with respect
to the precision.  The instability of optimizing with respect to the
precision parameters is most noticeable for small variance components
(large precision estimates).  It is particularly dramatic when the
variance goes to zero (precision to infinity).  Some of the many,
incompatible changes to the lme4 package - changes that have
inconvenienced many of its users whom I thank for their patience -
have been specifically for the purpose of stabilizing the estimation
situation for very small variance components.

The place where the choice of estimation algorithm matters the most is
in the estimation of small variance components so it doesn't surprise
me that differences crop up there.