Skip to content

How is the covariance factor computed?

8 messages · Vincent Dorie, Ben Bolker, steve.walker at utoronto.ca +1 more

#
Hello,

I am performing a priori power simulations for mixed-effect models based
on previous experiments. This works out quite nicely. I extract parts of
my parameters from a previous model I fitted:

prev_mod <- lmer(Y ~ A
                   + (B | Context)
                   + (B | Subjects),
                   data =3D data)

"A" :=3D 2 level factor
"Context" :=3D 40 level factor
"Subjects" :=3D 70 level factor

create design matrices for the fixed- and each random effect use functions from
the apply-family and bind them to a previously set-up data frame and so on.

In order to simulate data I draw from two multivariate distributions. One
for Subjects and one for Context. I previously constructed the
variance-covariance matrix by using the estimations I get by issuing
"as.data.frame(prev_mod)". After having read that Doug implemented a new
method for the "getME()" extractor "Tlist" that gives the covariance
factors from which the block matrices in "Lambda" are created I figured I
could get the variance-covariance matrix way easier by doing (code here
only for the "Context" variance-covariance matrix):

cov_fac <- getME(prev_mod, "Tlist")
cov_fac_context <- cov_fac$Context

sigma(prev_mod)^2*cov_fac_context%*%t(cov_fac_context)

But the question that has been haunting me for weeks now is how the individual
covariance factors (better: "matrices" in this case) that I can extract via
"getME(prev_mod, "Tlist") are computed. Is there some literature on that?  I
couldn't find any apart from the paper "Fitting linear mixed-effects models
using lme4" published 23.06.2014. I would be interested in reading up/get an
explanation how the covariance factor can be computed (mathematically and in
lme4) and if I need the variance-covariance matrix beforehand or vica versa.

Thank for any help!

Best,
Christian Brauner
Eberhard Karls Universit=C3=A4t T=C3=BCbingen
Mathematisch-Naturwissenschaftliche Fakult=C3=A4t - Faculty of Science
Evolutionary Cognition - Cognitive Science
Schleichstra=C3=9Fe 4 =C2=B7 72076 T=C3=BCbingen =C2=B7 Germany
Telefon +49 7071 29-75643 =C2=B7 Telefax +49 7071 29-4721
christianvanbrauner at gmail.com
#
I'm not sure if this answers your question, but the parameters of the variance/covariance matrix are stored in the theta slot of a merMod in a form such that they correspond to a Cholesky factorization of the individual components of the var/cov matrix of the random effects, with the diagonal in the first 'd' parts of the vector and the off-diagonal stored in column-major format in the next d * (d - 1) / 2 elements. Given a theta vector, to get to a representation such as that which Tlist gives requires knowing how to map the parameters to matrices. This is currently done by hand, using the knowledge that the cnms slot of a merMod contains the dimension of each grouping "factor"/"level" and the aforementioned Cholesky decomposition storage concept. In the future, however, if lme4 supports different forms of variance/covariances matrices for factors other than "full" (e.g. independent, or correlation only), then that knowledge will need to be referenced instead. I believe there is effort on that front in the "flexLambda" branch on github.

On the other hand, if you were asking where those numbers come from, it turns out that (at least for linear models) those parameters are sufficient to define a likelihood wherein the fixed effects and conditional error term (sigma) are analytically optimized. Since the goal is a maximum likelihood, or REML, the sigma parameters are then simply numerically optimized. You can then easily evaluate the mixed model likelihood at any value of the var/cov matrix of the random effects that you like, provided you are willing to accept maximal values for the fixed effects and sigma. If you wanted to plug those values in as well, it's a bit of a pain but it can be done.

Vince
On Jul 17, 2014, at 11:41 AM, Christian Brauner <christianvanbrauner at gmail.com> wrote:

            
#
Vincent Dorie <vdorie at ...> writes:

 [snip]
... specifically, for this last bit, see the devfun2() function in
https://github.com/lme4/lme4/blob/master/R/profile.R ; there is a
brief description of how this works in the lme4 preprint at
http://arxiv.org/abs/1406.5823 , in the 'profiling' section.  (I
think "... the sigma parameters are then simply numerically optimized"
should be "... the theta parameters ...") [defined in previous para.
as the elements of the Cholesky factorization(s) of the random effects
variance-covariance matri[xc](es) ...]

  Ben Bolker
#
Whoops. Ben is right, as usual.
On Jul 17, 2014, at 4:18 PM, Ben Bolker <bbolker at gmail.com> wrote:

            
#
Hi Christian,

Although Vince pretty much answered your question, the preprint that  
you refer to provides another description that some might find helpful,

http://arxiv.org/pdf/1406.5823v1.pdf

See "Constructing the relative covariance factor" starting at the  
bottom of p. 10 and "PLS step I: update relative covariance factor" at  
the bottom of p. 19.

The (profile) likelihood for theta that Vince discussed is equation 34  
on page 16.  Equation 30 is the likelihood for all three types of  
parameters -- theta, beta, and sigma.

Small point:  I think there is a typo in the answer below in that,
should refer to theta, not sigma.

Cheers,
Steve


Quoting Vincent Dorie <vdorie at cs.stanford.edu>:
#
Dear Vince, Ben and Steve,

thank you! I'm reading the paper and the suggested sections. It's really
good and really helpful! Just to make sure I understand you correctly: The
individual variance-covariance matrices for the random effects can be
easily computed from the data by "estimating" the variances and
covariances for the random effects. A simple example of two
variance-covariance matrices for two (non-scalar) random effects:

R1=
var_11 cov_21
cov_12 var_22


R2=
var_11 cov_21
cov_12 var_22

Vince stated that that the "theta" slot which in your paper you call
"variance-component parameter" (p. 4 right above equation (4)) contains
the Cholesky factorization of the individual components of the
variance-covariance matrices of the random effects. Hence, I assume that
you can just compute them in R calling "chol(R1)" and "chol(R2)". This
however, gives the wrong resuls when I compare the output of "getME(mod8,
"theta") and "chol(R1)" and "chol(R2)".  How do I compute Cholesky
factorization in R correctly?

Best,
Christian
On Thu, Jul 17, 2014 at 09:38:06PM -0400, steve.walker at utoronto.ca wrote:
#
On 14-07-18 08:39 AM, Christian Brauner wrote:
You probably forgot to scale by the residual standard error:

chol(VarCorr(fm1)[[1]])/sigma(fm1)
getME(fm1,"theta")
#
Thanks Ben!
Figured it out myself pretty quickly. Seems you also have to transpose:

t(chol(VarCorr(fm1)[[1]]/sigma(fm1))

if you want the representation to be identical to

"getME(fm1, "Tlist")[[1]]


Thanks again Vince, Steve and Ben!
(I really enjoyed reading the paper you guys wrote!)
On Thu, Jul 17, 2014 at 03:24:07PM -0400, Vincent Dorie wrote: