Skip to content

coef se in lme

4 messages · dave fournier, Douglas Bates, David Duffy

#
Here is one approach to this problem.
In the AD Model Builder Random Effects package we provide estimated
standard deviations for any function of the fixed and random effects, 
(here I include the parameters which detemine the covarince matrices if 
present) and the random effects. This is for general nonlinear random 
effects models, but the calculations can be used for linear models as 
well. We calculates these estimates as follows. Let L(x,u)
be the log-likelihood function for the parameters x and u given the 
observed data,
where u is the vector of random effects and x is the vector of the other 
parameters. Let F(x) be the log-likelihood for x after the u have been 
integrated out. This integration might be exact or more commonly via the 
Laplace approximation or something else.
For any x let uhat(x) be the value of u which maximizes L(x,u),
and let xhat be the value of x which maximizes
F(x). The the estimate for the covaraince matrix for the x is then
S_xx = inv(F_xx) and the estimated full covariance matrix Sigma for the 
x and u is given by


S_xx                 S_xx * uhat_x
(S_xx * uhat_x)' uhat' * S_xx * uhat_x + inv(L_uu)

where ' denotes transpose _x denotes first derivative wrt x (note that 
uhat is a function of x so that uhat_x makes sense)and _xx _uu denote 
the second derivatives wrt x and u. we then use Sigma and the delta 
method to estimate the standard deviation of any (differentiable) 
function of x and u.
~
~
#
On 10/18/07, dave fournier <otter at otter-rsch.com> wrote:

            
I know it may sound pedantic but I don't know what a log-likelihood
L(x,u) would be because you are treating parameters and the random
effects as if they are the same type of object and they're not.  If
you want to use a Bayesian approach you can kind of level the playing
field and say that everything is a parameter except for the observed
data values.  However, Bayesians also need to come up with a prior and
that isn't trivial in this case, as I tried to indicate in my message
about the mcmcsamp chain taking excursions.

I find I can easily confuse myself in the theory of the maximum
likelihood or REML estimates if I am not careful about the terminology
and the roles of the different coefficients in the linear predictor.
I think I would call that function L(x,u) the conditional density of
the random effects given the data.  The parameters determine the joint
density for the random effects and responses so plugging the observed
values of the responses into this expression yields the conditional
density of the random effects.
I think that is what I would call the conditional modes of the random
effects.  These depend on the observed responses and the model
parameters.
I'm getting a little bit lost here.  In the example you sent based on
Harold's discussion, the dimension of x is 3 and the dimension of u is
10 so Sigma is a 13 by 13 matrix, right?  S_xx is 3 by 3 and L_uu is
10 by 10.  To form the product S_xx*uhat_x I think that uhat_x needs
to be 3 by 10.  Is that right?  (I'm used to writing the Jacobian of a
vector-valued function of a vector the other way around.)

It looks like you are missing a _x in the first term in "uhat' * S_xx * uhat_x".

To evaluate L_uu you need a value of x.  I assume you will use the
parameter estimates. Correct?

Will the parameterization of x affect the result?  If I write the
model in terms of the logarithms of the variances instead of the
variances I will definitely get a different Sigma but will the result
for a linear combination of mu and some of the u's stay invariant?  If
it isn't invariant, how do I choose the parameterization of the
variance components?

Can you give a bit more detail on how you justify mixing derivatives
of the marginal log-likelihood (F) with derivatives of the conditional
density (L).  Do you know that these are on the same scale?  I'm
willing to believe that they are - it is just that I can't see right
off why they should be.
#
On Thu, 18 Oct 2007, Douglas Bates wrote:

            
[Snip]
I find all of this is a bit above my head, but I do have a paper by Matt 
Wand _Fisher information for generalised linear mixed models_
Journal of Multivariate Analysis, (2007), 98, 1412-1416.

This looks at the canonical-link GLMM where the random effects are
Gaussian for a simple one-level/random intercepts model, and
ends rather abruptly ;) with

"Remark 2. Approximate standard errors for the maximum likelihood
estimates beta-hat and sigma-squared-hat can be obtained from the
diagonal entries of I(b-hat,s-hat2)^-1. However, as pointed out in Remark 1,
implementation is often hindered by intractable multivariate integrals.
Additionally, dependence among the entries of y induced by u means that
central limit theorems of the type: I (bhat,shat2)^-1 {(bhat,shat2)-(b,s2)
2)} converges in distribution to a N (0, I) random vector, have not been
established in general and, hence, interpretation of standard errors
is cloudy. Nevertheless, there are many special cases, such as
m-dependence when the data are from a longitudinal study, for which
central limit theorems can be established."

You'll have to read the paper which gives the derivation and formulae.


David Duffy.
#
ouglas Bates wrote:
> On 10/18/07, dave fournier <otter at otter-rsch.com> wrote:
>
 >
 >> Here is one approach to this problem.
 >>
 >
 >
 >> In the AD Model Builder Random Effects package we provide estimated
 >> standard deviations for any function of the fixed and random effects,
 >> (here I include the parameters which detemine the covarince matrices if
 >> present) and the random effects. This is for general nonlinear random
 >> effects models, but the calculations can be used for linear models as
 >> well. We calculate these estimates as follows. Let L(x,u)
 >> be the log-likelihood function for the parameters x and u given the
 >> observed data,
 >> where u is the vector of random effects and x is the vector of the other
 >> parameters.
 >>
 >
 > I know it may sound pedantic but I don't know what a log-likelihood
 > L(x,u) would be because you are treating parameters and the random
 > effects as if they are the same type of object and they're not.  If
 > you want to use a Bayesian approach you can kind of level the playing
 > field and say that everything is a parameter except for the observed
 > data values.  However, Bayesians also need to come up with a prior and
 > that isn't trivial in this case, as I tried to indicate in my message
 > about the mcmcsamp chain taking excursions.
 >
 > I find I can easily confuse myself in the theory of the maximum
 > likelihood or REML estimates if I am not careful about the terminology
 > and the roles of the different coefficients in the linear predictor.
 > I think I would call that function L(x,u) the conditional density of
 > the random effects given the data.  The parameters determine the joint
 > density for the random effects and responses so plugging the observed
 > values of the responses into this expression yields the conditional
 > density of the random effects
 >
OK we will call them that.

 >> Let F(x) be the log-likelihood for x after the u have been
 >> integrated out. This integration might be exact or more commonly via the
 >> Laplace approximation or something else.
 >> For any x let uhat(x) be the value of u which maximizes L(x,u),
 >>
 >
 > I think that is what I would call the conditional modes of the random
 > effects.  These depend on the observed responses and the model
 > parameters.
 >

   That's right.
 >
 >> and let xhat be the value of x which maximizes F(x).
 >>
 >
 >
 >> The estimate for the covariance matrix for the x is then
 >> S_xx = inv(F_xx) and the estimated full covariance matrix Sigma for the
 >> x and u is given by
 >>
 >
 >
 >> S_xx                 S_xx * uhat_x
 >> (S_xx * uhat_x)' uhat' * S_xx * uhat_x + inv(L_uu)
 >>
 >
 >
 >> where ' denotes transpose _x denotes first derivative wrt x (note that
 >> uhat is a function of x so that uhat_x makes sense) and _xx _uu denote
 >> the second derivatives wrt x and u. we then use Sigma and the delta
 >> method to estimate the standard deviation of any (differentiable)
 >> function of x and u.
 >>
 >
 > I'm getting a little bit lost here.  In the example you sent based on
 > Harold's discussion, the dimension of x is 3 and the dimension of u is
 > 10 so Sigma is a 13 by 13 matrix,
that is right.

 > right?  S_xx is 3 by 3 and L_uu is
 > 10 by 10.  To form the product S_xx*uhat_x I think that uhat_x needs
 > to be 3 by 10.  Is that right?  (I'm used to writing the Jacobian of a
 > vector-valued function of a vector the other way around.)
 >
Yes it is either 3x10 or 10x3.   Since S_xx is 10 by 10 if one wirtes 
S_xx * uhat_x  uhat_x must be 10 by 3
i.e 10 rows and 3 columns.

 > It looks like you are missing a _x in the first term in "uhat' * S_xx 
* uhat_x".
 >
 >
That is right it should be  uhat_x' * S_xx * uhat_x
 > To evaluate L_uu you need a value of x.  I assume you will use the
 > parameter estimates. Correct?
 >
 >
Yes the derivatives are evaluated at  xhat,uhat.

 > Will the parameterization of x affect the result?
No.    supposte that   we have a reparamseterization  of the x's given by

   x=g(y)

  then wrt y the function is   F(g(y)) so the Hessian wrt y is

                    g_y' * F_xx * g_y

This is due to the fact that F_x=0.  Taking the inverse gives

                inv( g_y) * inv(F_xx) * inv(g_y')

so that when you do the delta method the g_y matrices cancel out.
 > If I write the
 > model in terms of the logarithms of the variances instead of the
 > variances I will definitely get a different Sigma
Yes but if you then use to delta method to construct the covariance 
matrix for the depdent variables
exp(log_sigmas) you will recover the original. This is just the chain rule.

 > but will the result
 > for a linear combination of mu and some of the u's stay invariant?  If
 > it isn't invariant, how do I choose the parameterization of the
 > variance components?
 >
Same as above. the whole thing is independent of how one parameterizes 
the x's.
 > Can you give a bit more detail on how you justify mixing derivatives
 > of the marginal log-likelihood (F) with derivatives of the conditional
 > density (L).  Do you know that these are on the same scale?  I'm
 > willing to believe that they are - it is just that I can't see right
 > off why they should be.
 >
 >
Think of it this way.  You can use the marginal likelihood Hessian F_xx 
to construct the estimated covariance matrix
for the independent variables x. Then via the delta  method you 
calculate the covariance matrix for the dependent variables
uhat(x). So far this is just pure frequentist stuff. Now if you know 
what that uhat actually are inv(L_uu) is an estimate for the
variance of the u's.  So a simple estimate of the overall variance of 
the u's is to add the two together. So
uhat_x' * S * uhat_x is the extra variance in the u's introduced by the 
uncertainty in the true value of the uhat
due to uncertainty in the true value of the x's.


-- 
David A. Fournier
P.O. Box 2040, Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com