Skip to content

understanding I() in lmer formula

9 messages · Emmanuel Curis, Ben Bolker, Don Cohen

#
(    V(A)     cov( A, B ) )   ( f   g )
  Sigma = (                         ) = (       )
          ( cov( A, B )     V(B)    )   ( g   h )


  so the aim of lme4 (or any other software) is to find the << best >>
  values for all of these five parameters.

So when I run lmer I should be able to recover these 5 values, right?
How do I do that?

When you say it finds the best values, that means to me that the
objective function (whatever we're trying to minimize) includes 
something that depends on those parameters.
What is that function? 
I guess you have to pick some particular model in order to show its
objective function, and I suggest the simplest possible model, in
this case something like
 reaction ~ 1+days + (1+days|subject)
and similarly the || version, which I gather differs only in that
some g terms are left out.

  This is done by trying to
  find the values that allow to say << I obtained my data because they
  were the more likely to occur >>. This leads to a complex function that
  often reduce to least-squares, but not always, and in Gaussian
  mixed-effects models are not linear least-squares because of the f, g
  and h parameters.

  Traditionnally, you fix uA = uB = 0 because should they have other
  values, they could not be distinguished from the fixed part of the
  model (but you could also say that there is no distinct fixed part and
  that you try to find their values, it's the same).

  When you use (1|Day), you allow lmer to fit a model with f, g and h
  variable.

  When you use (1||Day), you constrain lmer to fit a model with g = 0
  and only f and h can be fitted.

All of this makes sense to me, but in order to really understand what's
going on I want to know the objective function.

  Note that with lmer, f, h (and g if allowed) are not obtained by first
  computing slope and intercept for all subjects, then doing usual
  descriptive statistics ...

I understand, but however the software actually works, I should be able
to see that the objective function is minimized by simply computing it
on the output and then also on various perturbations of the output.
That would tell me WHAT the software is doing.  I could worry later about
HOW it does that.

  Last note: do not confuse the correlation between A and B, the random
  effects in the population, given by g, and the correlation between the
  estimators of the (mean) slope [uA] and the estimator of the (mean)
  intercept [uB], M_A and M_B, which may be what you had in mind when
  saying that A and B values were correlated << before >> (it exists in
  usual linear regression). They correspond to different ideas:

  g means that there is in the population a kind of constraint
   between A and B ;

  the correlation between the _estimators_ means that any error on
   the estimation of the (mean) slope will lead to an error in the
   estimation of the (mean) intercept and it is a property of the
   method used to find them, not of the data/underlying world.

I had not even thought about the second of those.  
But I think that is similar to one of the outputs of summary(lmer(...))
where it says correlation of fixed effects.

  Hope this clarifies,

So far I don't think I've learned anything new, but I may be getting
close to describing to you what I'm missing.
#
Replies in the text. Please read about maximum likelihood methods, the
objective function is the likelihood of the data, and write it down
will give you what you want. In practice, it might be slightly
different if you used restricted maximum likelihood (REML) instead of
maximum likelihood (ML), but that does not change the interpretation
of the different models and outputs, just the objective function.
On Thu, Jun 15, 2017 at 02:49:50PM +0000, Don Cohen wrote:
? 
?           (    V(A)     cov( A, B ) )   ( f   g )
?   Sigma = (                         ) = (       )
?           ( cov( A, B )     V(B)    )   ( g   h )
? 
? 
?   so the aim of lme4 (or any other software) is to find the << best >>
?   values for all of these five parameters.
? 
? So when I run lmer I should be able to recover these 5 values, right?
? How do I do that?

All of them are in summary( lmer( formula, data = ... ) )

For ?A and ?B estimtions, you can have them with fixef()
For f, g and h, you can have them with VarCorr()

? When you say it finds the best values, that means to me that the
? objective function (whatever we're trying to minimize) includes 
? something that depends on those parameters.
? What is that function? 

The opposite of the log likelihood, -log L, of the data, more
precisely assuming independant data points

L = \prod_{i=1}^n \prod_{,j=1}^{n_i} F'(Yij=y_ij|Ai=ai,Bi=bi) G'(Ai=ai, Bi=Bi)

where F' is the density of a Gaussian of mean ai + bi * days_{i,j} and
variance sigma (the variance of epsilon, the residual) and G' the
joint density of a binormal Gaussian vector of expectation (?A, ?B)
and of covariance matrix Sigma = ( (f, g ), ( g, h ) ) and you
minimize -log L. i the subject index, j the measure index within the subject.

? I guess you have to pick some particular model in order to show its
? objective function, and I suggest the simplest possible model, in
? this case something like
?  reaction ~ 1+days + (1+days|subject)
? and similarly the || version, which I gather differs only in that
? some g terms are left out.
? 
?   This is done by trying to
?   find the values that allow to say << I obtained my data because they
?   were the more likely to occur >>. This leads to a complex function that
?   often reduce to least-squares, but not always, and in Gaussian
?   mixed-effects models are not linear least-squares because of the f, g
?   and h parameters.
? 
?   Traditionnally, you fix uA = uB = 0 because should they have other
?   values, they could not be distinguished from the fixed part of the
?   model (but you could also say that there is no distinct fixed part and
?   that you try to find their values, it's the same).
? 
?   When you use (1|Day), you allow lmer to fit a model with f, g and h
?   variable.
? 
?   When you use (1||Day), you constrain lmer to fit a model with g = 0
?   and only f and h can be fitted.
? 
? All of this makes sense to me, but in order to really understand what's
? going on I want to know the objective function.

Just write it down using the hints above... Or use the formulas in
Douglas Bates' book (or other books). They _are_ the objective
function. By e-mail, that would be almost impossible to write them in
a readable way.

?   Note that with lmer, f, h (and g if allowed) are not obtained by first
?   computing slope and intercept for all subjects, then doing usual
?   descriptive statistics ...
? 
? I understand, but however the software actually works, I should be able
? to see that the objective function is minimized by simply computing it
? on the output and then also on various perturbations of the output.

Once you wrote it, you can use dnorm to compute it, and use the logLik
function to check the results.

? That would tell me WHAT the software is doing.  I could worry later about
? HOW it does that.

It maximises the likelihood (L) ? or, equivalently, minimises -log L ?
as a function of (?A, ?B, f, g, h).

?   Last note: do not confuse the correlation between A and B, the random
?   effects in the population, given by g, and the correlation between the
?   estimators of the (mean) slope [uA] and the estimator of the (mean)
?   intercept [uB], M_A and M_B, which may be what you had in mind when
?   saying that A and B values were correlated << before >> (it exists in
?   usual linear regression). They correspond to different ideas:
? 
?   g means that there is in the population a kind of constraint
?    between A and B ;
? 
?   the correlation between the _estimators_ means that any error on
?    the estimation of the (mean) slope will lead to an error in the
?    estimation of the (mean) intercept and it is a property of the
?    method used to find them, not of the data/underlying world.
? 
? I had not even thought about the second of those.  
? But I think that is similar to one of the outputs of summary(lmer(...))
? where it says correlation of fixed effects.
? 
?   Hope this clarifies,
? 
? So far I don't think I've learned anything new, but I may be getting
? close to describing to you what I'm missing.
#
This is all good.  vignette("lmer",package="lme4") gives all of the
technical details for lmer (glmer is a bit more complicated, and the
paper describing it is still in development ...)
On 17-06-15 12:02 PM, Emmanuel Curis wrote:
#
Emmanuel Curis writes:
 >  So when I run lmer I should be able to recover these 5 values, right?
 >  How do I do that?
 > 
 > All of them are in summary( lmer( formula, data = ... ) )
> For uA and uB estimtions, you can have them with fixef()
 > For f, g and h, you can have them with VarCorr()

Ah, now that I compare fixef and VarCorr with summary
I see which ones you mean.

 > The opposite of the log likelihood, -log L, of the data, more
 > precisely assuming independant data points
 > 
 > L = \prod_{i=1}^n \prod_{,j=1}^{n_i} F'(Yij=y_ij|Ai=ai,Bi=bi) G'(Ai=ai, Bi=Bi)
 > 
 > where F' is the density of a Gaussian of mean ai + bi * days_{i,j} and
 > variance sigma (the variance of epsilon, the residual) and G' the
 > joint density of a binormal Gaussian vector of expectation (uA, uB)
 > and of covariance matrix Sigma = ( (f, g ), ( g, h ) ) and you
 > minimize -log L. i the subject index, j the measure index within the subject.

I'm having a little trouble with your character set (notice I've been
replacing your mu's with "u") and a little more with the \math
notation, but I think what I'm seeing above is that the likelihood
measure is computed purely from measures for the data points and the
measure for a single data point is the product of a likelihood of the
a,b pair for the group, which is G', and the likelihood of the point
given the group, which is F'.

Now this already surprises me in the sense that the cost of the 
group seems to be paid above once for every data point in the group
instead of only one time for the group.
It seems to me that if I were trying to specify the groups and 
data points I'd only have to specify each group once and within
that each data point for the group once.
But maybe this is just a matter of missing parens?
That would be 
product over all groups of
   prob for (a,b) for this group
   x 
   product over all points in the group of
      prob of point given group (related to square of residual)

A few other details:
The residual is still only one number for each point, right, so sigma,
its variance, is only one number, right?
And this is estimated by just computing the variance of the residues?
I'm guessing that's the same thing as finding the value that minimizes 
the product of the F' values.
Is there a function to compute G' ?
(Is this it?
 library(mvtnorm)
 dmvnorm(c(5.280231, 9.719769), c(5,10),matrix(c(1,-.9,-.9, 1),nrow=2,ncol=2))
)

 > Just write it down using the hints above... Or use the formulas in
 > Douglas Bates' book (or other books). They _are_ the objective
 > function. By e-mail, that would be almost impossible to write them in
 > a readable way.

Which book is this?  URL ?
lme4: Mixed-effects modeling with R, June 25, 2010 ?
http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf ?
I'll look further at that.

Also, bbolker writes:
 vignette("lmer",package="lme4") gives all of the technical details
It gives me:
 Warning message:
 vignette 'lmer' not found 

Is this the same as (or similar to)
https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf
Or
https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
which seems to be the same as the one I had trouble with before.
But maybe I can make more progress in these now that I have some
ability to ask questions.

 >  So far I don't think I've learned anything new, but I may be getting
 >  close to describing to you what I'm missing.

This is progress.  It's similar to what I was expecting, but not
what I was able to explain from my simple examples.  It's possible
that the problem was in my examples, but that remains to be seen.
1 day later
#
(Probably I should change the subject line and admit that this is no
longer about I().)

Emmanuel Curis writes:
 > L = \prod_{i=1}^n \prod_{,j=1}^{n_i} F'(Yij=y_ij|Ai=ai,Bi=bi) G'(Ai=ai, Bi=Bi)
 > 
 > where F' is the density of a Gaussian of mean ai + bi * days_{i,j} and
 > variance sigma (the variance of epsilon, the residual) and G' the
 > joint density of a binormal Gaussian vector of expectation (uA, uB)
 > and of covariance matrix Sigma = ( (f, g ), ( g, h ) ) and you
 > minimize -log L. i the subject index, j the measure index within the subject.
 ...
 > Once you wrote it, you can use dnorm to compute it, and use the logLik
 > function to check the results.

I'm trying to do this and things are not working out quite as I hoped.
Starting with the result of lmer on my sample data:
  > summary(xm) 
  Linear mixed model fit by maximum likelihood  ['lmerMod'] 
  Formula: xout ~ xin + (xin | xid) 
     Data: xd 

       AIC      BIC   logLik deviance df.resid  
     -16.2    -15.7     14.1    -28.2        2  

  Scaled residuals:  
       Min       1Q   Median       3Q      Max  
  -0.89383 -0.29515  0.05005  0.33286  0.80360  

  Random effects: 
   Groups   Name        Variance  Std.Dev. Corr  
   xid      (Intercept) 2.814e-03 0.053051       
            xin         6.583e-03 0.081134 -0.03 
   Residual             4.917e-05 0.007012       
  Number of obs: 8, groups:  xid, 3 

  Fixed effects: 
              Estimate Std. Error t value 
  (Intercept)  0.06061    0.03172   1.911 
  xin          1.00397    0.04713  21.301 

  Correlation of Fixed Effects: 
      (Intr) 
  xin -0.058 

  > coef(xm) 
  $xid 
    (Intercept)      xin 
  a  0.09870105 1.100632 
  b -0.01183068 1.008098 
  c  0.09496031 0.903172 

The good news is that I seem to be correctly computing the predicted
values, e.g., 0.09870105 + 1.100632 * xin for a data point in
group "a" gives a value which differs from its xout value by the same
amount as the corresponding value of xm at resp$wtres (which I surmise
is a vector of the residues).
So I expect that F' will be 
 dnorm(xm at resp$wtres ...)
The mean should be zero, by design ...
  > mean(xm at resp$wtres) 
  [1] -2.775558e-17 
as expected, essentially zero

The sd to pass to dnorm should be what?  Is that the .007012 from 
this line above?
   Residual             4.917e-05 0.007012       
That's not very close to what I see in the actual residuals:
  > var(xm at resp$wtres) 
  [1] 1.566477e-05 
  > sd(xm at resp$wtres) 
  [1] 0.003957874 
And I'd expect the value to be at least close to the actual SD of
the residuals.  If the correct value is neither of the two above,
please tell me what it is (or how to find it).

Then for G, I'm even less certain.  I suspect I'm supposed to use
dmvnorm(...)
The first argument would be one of the lines from the coef output above,
such as c(0.09870105, 1.100632), right?
The second (mean) would be the estimates of the fixed effects,
 c(0.06061,1.00397), right?
And the third, is supposed to be the variance-covariance matrix.
That would be your 
 f g
 g h
where f and h, are the two variances shown under:
  Random effects: 
   Groups   Name        Variance  Std.Dev. Corr  
   xid      (Intercept) 2.814e-03 0.053051       
            xin         6.583e-03 0.081134 -0.03 
   Residual             4.917e-05 0.007012       
so f = 2.814e-03, h = 6.583e-03
and g, I *think* would be the product of the two SD's and the Corr,
or 0.053051 * 0.081134 * -0.03
Is that all correct?

I hope you can see some errors above, since this doesn't look like
it's going to end up computing anything like the logLik of 14.1 !

Also, I have the feeling that all of this output is rounded, and that
I'd get better results (or at least results that agree with the other
results computed in the model) if I could find these data in the model
rather than copying them from the output.  I've found some of it, like
the residues (at least that's what I think wtres is!), but other data
above, such f,g,h I have not yet found.  If you can tell me how to find
(or compute) them, that would be helpful.
#
For the level of detail you're getting into, it would be a really good
idea to read the paper that accompanies the lme4 package:
vignette("lmer",package="lme4") .  This goes into a lot of detail
about the theory and data structures ...
On Fri, Jun 16, 2017 at 11:48 PM, Don Cohen <don-r-help at isis.cs3-inc.com> wrote:
#
Ben Bolker writes:
 > For the level of detail you're getting into, it would be a really good
 > idea to read the paper that accompanies the lme4 package:
 > vignette("lmer",package="lme4") .  This goes into a lot of detail
 > about the theory and data structures ...

 vignette("lmer",package="lme4") 
gives me
 Warning message: 
 vignette 'lmer' not found  

Is this the same as 
https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf ?

I think that's the same one that I was having trouble with before
and gave up around eqn 15.
Although, I had the impression that it (looks like eqn 14) was
describing what I expected and asked about in a previous message,
namely paying only once for each group and then once for each data
point within the group.

Are you saying that the vignette link above actually answers the
questions in this last message about how to compute the loglik of
the model?  It doesn't look to me like it will.  
I view my current line of questions (and I have many more, but am
trying to resist bombarding you with all at once) as a way to get
the background I'll need to get through that paper.

In fact, one of my questions when I read that paper was what 
correlated vs uncorrelated intercept and slope meant - I didn't 
see any explanation.  I think that Emmanuel Curis has now explained
that, but I'm still trying to check my understanding.

Since I'm writing anyway, let me indulge in one more question about
the formulas.  Since (x|g) means correlated intercept and slope for
x, does (x+y|g) include a separate correlation between x slope and
y slope?  That is, the cost of specifying a group would involve a
3 dimensional normal distribution over intercept,x,y ?
#
On Sat, Jun 17, 2017 at 12:29 PM, Don Cohen <don-r-help at isis.cs3-inc.com> wrote:
That's surprising ... what's packageVersion("lme4") ?
Yes.
I would also recommend checking out Pinheiro and Bates (2000),
which is a full (book-length) treatment of the same topic, so is a
tiny bit more discursive/friendlier ...
Yes.    (So in particular this model are 3*(3+1)/2=6 parameters
(equivalent to s^2{1}, s^2(x),s^2(y), cov(1,x), cov(1,y), cov(x,y))
#
> > gives me
 > >  Warning message:
 > >  vignette 'lmer' not found
 > 
 > That's surprising ... what's packageVersion("lme4") ?
[1] '1.1.14'
[actually I'm replacing left and right non-ascii quotes above]

 > Pinheiro and Bates (2000),
I'll look, but it seems that for something that old, even if it does 
contain these details, they are likely to be out of date.
I noticed http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf
which is from 2010 says on p.15
 The vector u is available in fm01ML at re.  The vector Beta and the model matrix 
 X are in fm01ML at fe. 
and this seems no longer true - I get
  Error: no slot of name "fe" for this object of class "lmerMod" 

 >   Yes.    (So in particular this model are 3*(3+1)/2=6 parameters
 > (equivalent to s^2{1}, s^2(x),s^2(y), cov(1,x), cov(1,y), cov(x,y))

That might explain why changing (x+y+z+u+v+w||g) to single | seems to
run a lot longer (I interrupted it so I don't know how much longer).
Do you have an estimate of how number of terms affects run time?

If a paper describes various things being used as random effects
would you assume | or || ?
Perhaps it depends on the number of effects?  The example I sent
contained quite a few, and the original from which that was extracted
contained over a dozen.