Skip to content

coef se in lme

4 messages · Douglas Bates, Doran, Harold, dave fournier

#
On 10/15/07, Doran, Harold <HDoran at air.org> wrote:
The vcov method returns the estimated variance-covariance matrix of
the fixed-effects only.  I think Irene's question is about the
combination of the fixed-effects parameters and the BLUPs of the
random effects that is returned by the coef method applied to an lmer
object.  (You may recall that you were the person who requested such a
method in lme4 like the coef method in nlme :-)

On the face of it this quantity should be easy to define and evaluate
but in fact it is not easy to do so because these are combinations of
model parameters (the fixed effects) and unobserved random variables
(the random effects).  It gets a bit tricky trying to decide what the
variance of this combination would be.  I think there is a sensible
definition, or at least a computationally reasonable definition, but
there are still a few slippery points in the argument.

Lately I have taken to referring to the "estimates" of the random
effects, what are sometimes called the BLUPs or Best Linear Unbiased
Predictors, as the "conditional modes" of the random effects.  That
is, they are the values that maximize the density of the random
effects given the observed data and the values of the model
parameters.  For a linear mixed model the conditional distribution of
the random effects is multivariate normal so the conditional modes are
also the conditional means.  Also, we can evaluate the conditional
variance-covariance matrix of the random effects up to a scale factor.

The next part is where things get a bit hazy for me but I think it
makes sense to consider the joint distribution of the estimator of the
fixed-effects parameters and the random effects conditional on the
data and, possibly, on the variance components.  Conditional on the
relative variance-covariance of the random effects (i.e. the matrix
that occurs as the penalty term in the penalized least squares
representation of the model) the joint distribution of the
fixed-effects estimators and the random effects is multivariate normal
with mean and variance-covariance matrix determined from the
mixed-model equations.

This big (p+q by p+q, where p is the dimension of the fixed effects
and q is the dimension of the random effects) variance-covariance
matrix could be evaluated and, from that, the variance of any linear
combination of components.  However, I have my doubts about whether it
is the most sensible answer to evaluate.  Conditioning on the relative
variance-covariance matrix of the random effects is cheating, in a
way.  It would be like saying we have a known variance, $\sigma^2$
when, in fact, we are using an estimate.  The fact that we don't know
$\sigma^2$ is what gives rise to the t distributions and F
distributions in linear models and we are all trained to pay careful
attention to the number of degrees of freedom in that estimate and how
it affects our ideas of the precision of the estimates of other model
parameters.  For mixed models, though, many practioners are quite
comfortable conditioning on the value of some of the variance
components but not others.  It could turn out that conditioning on the
relative variance-covariance of the random effects is not a big deal
but I don't know.  I haven't examined it in detail and I don't know of
others who have.

Another approach entirely is to use Markov chain Monte Carlo to
examine the joint distribution of the parameters (in the Bayesian
sense) and the random effects.  If you save the fixed effects and the
random effects from the MCMC chain then you can evaluate the linear
combination of interest throughout the chain and get an empirical
distribution of the quantities returned by coef.

This is probably an unsatisfactory answer for Irene who may have
wanted something quick and simple.  Unfortunately, I don't think there
is a quick, simple answer here.

I suggest we move this discussion to the R-SIG-Mixed-Models list which
I am cc:ing on this reply.
#
I'm particularly interested in this, so I'll bite. Let me start with
something very basic and build up later as I think more about this.

Assume we have a model such as

Y_ij = \mu + a_i + e_ij \quad a_i \sim N(0, s^2), e_ij \sim N(0, r^2)

In this basic model, we also assume the variance components are
orthogonal, denoted e_ij \bot a_i

Now, from ranef() I get the estimate of a for all i (what Doug calls the
conditional modes) and, also using the postVar argument in this
extractor function I get variances of these conditional modes. 

Now, vcov() would return the variance for the fixed parameter \mu.

Given that we can find the variances for each term in the model, and
that they are assumed orthogonal, why is it that we can't simply take
the variance of a linear combination? 

For example, if I want to find:

\hat{y}_ij = \mu + a_i

Then the variance of this would be simply

Var(\hat{y}_ij) = var(\mu) + var(a_i)

And in this case there is no covariance given e_ij \bot a_i, right? 

Even if this scenario is too simplistic, I assume I can use a taylor
series expansion to derive something that gives me an estimate of the
variance of \hat{y}_ij in closed form that is pretty close, no?

OK, I'm ready for my lashing.

Harold
#
On 10/18/07, Doran, Harold <HDoran at air.org> wrote:
Because the conditional distribution of the random effects (given the
data) depends on the fixed effects.  If you change \mu you will get
different conditonal modes for the a_i and you must take this into
account when considering the variation in the combination you show
below.
That would apply if \hat{\mu} and a_i were independent, but they aren't.

For this example (and thank you for suggesting a simple example for
discussion), the calculation I suggested would be conditional on the
ratio s/r or, equivalently, the ratio s^2/r^2.  I refer to this as the
"relative variance" of the random effects or, in more general cases,
the relative variance-covariance of the random effects.

If we condition on both variance components, s^2 and r^2, the joint
distribution of the random effects and the estimators of the fixed
effects is multivariate normal.  The mean of this distribution is the
solution to a penalized linear least squares problem.  Exactly the
same calculations as we would use for a linear least squares problem
will give us the joint distribution of the estimators of the fixed
effects and the random effects marginalized over r^2 but conditional
on the ratio s^2/r^2.

The "marginalized over r^2" of that last sentence is what leads us to
t distributions instead of z distributions and, after suitable
normalization, F distributions instead of chi-squared distributions.
What I haven't quite got a handle on yet is whether it is as important
to consider the effects of uncertainty in our estimate of s^2/r^2 as
it is to consider the effects of uncertainty in our estimate of s^2.
For fixed effects models we know that in some circumstances it is very
important to consider the uncertainty in our estimate of r^2.  This is
what Gosset did in formulating the T distribution and what Fisher
generalized into the analysis of variance.

Can we reasonably assume that variability in our estimate of r^2 is
important but variability in our estimate of s^2/r^2 is not?  If so
then we can apply the usual results from least squares to the
penalized least squares problem (conditional on the ratio r^2/s^2) and
we're done.  We need the model matrices for the fixed and the random
effects and the penalty matrix and we do need to be careful of how the
computation is done, especially in cases such as you consider where
the number of random effects can get into the millions, but the theory
is straightforward.  However, I don't think it is that cut and dried
in all cases.

It is likely that, in your world, conditioning on an estimate of
s^2/r^2 would not be a stretch.  In fact, you could condition on an
estimate of r^2 while you were at it because you have a huge number of
degrees of freedom in your estimate of the residual variance.  The
differences between a T distribution with 10 million degrees of
freedom and a standard normal distribution are not worth worrying
about.

So I think that in your world it would be reasonable to use the
penalized least squares representation and calculate the results from
that.  You mentioned above the "posterior variances" (which I would
call conditional variances if I were to do it over again) of the
random effects.  These come from the penalized least squares
representation but conditional on the fixed effects.  The get the
joint distribution of the random effects and the estimators of the
fixed effects it is only necessary to extend that calculation a bit.
combination you want to consider.

For smaller data sets it may be worthwhile going the Markov chain
Monte Carlo route because that takes into account the variation in all
the parameter estimates and in the random effects.  However, Steven
Novick recently sent me mail with a reproducible example about
difficulties with the Markov chain Monte Carlo samples drawn from some
models.  I'll send a slightly expanded version of that report to the
list for comment later today.
#
The AD Model Builder method I posted earlier takes into account the
uncertainty in the mean and both std deviations in Harold's simple model.

        Y_ij - mu + u_i +eps_ij

To illustrate this
I built a little model and simulated a data set with 1<=i<=10, 1<=j<=5 
observations.  Below are the parameter estiamtes tigether with their
estimated std devs.  The true values were mu=3.0 sigma_u=2.0 and sigma=3.0
then I fixed log_sigma_u and log_sigma at their
estimated values and obtained.
and finally I almost fixed mu
(can't fix it completley because then there would be
no "fixed" effects and the model thinks there is
nothing to do.) and obtained
So for example u(1) the first random effect has estimated std devs

  6.4561e-01, 6.4407e-01, and 5.9145e-01


under the three models. It appears that most of the "extra" uncertainty 
in u(1) comes from the uncertainty in mu.