lmer - BLUP prediction intervals
Hi Harold, I think that we're on the same page. My opinion is that the estimates and the predictions are both random variables. They have a joint distribution under the model, and it is possible to estimate the parameters of that joint distribution, under the model. I tend to think that if I can't write down their joint distribution then I'm not sure what are the grounds for taking any linear combination - including the very useful adding of them - either. If obtaining the covariance term is reasonable within epsilon time and effort, then I would advocate doing that and following your prescription. Andrew.
On Thu, Feb 07, 2013 at 09:45:38AM -0500, Doran, Harold wrote:
Andrew: I agree with you and think that is reasonable too, but would offer the following thoughts. Doug and I have talked about this for a while. As usual, he makes excellent points and I've struggled with this specific issue. In the end, I have found myself taking the position that you argued. It remains uncontroversial to add the estimate of the fixed effects with the BLUPs. Doug noted one is _estimated_ as a parameter (i.e., the fixed effect) and the BLUP is not estimated in the same way the fixed effect is as a model parameter. But, it is _kind of_ an estimate. It is predicted conditional on the variance components and the fixed effects themselves and it has a sampling distribution. So, I've taken the position that these are two statistical "things" both with model-based sampling distributions and a covariance term. In my world, there are often stakes associated with the use of the data and if I were to ignore the covariance term between the BLUPs and the fixed effects I could risk underestimating the true sampling variance of that linear combination. So, in my world, I view them as a linear combination with a covariance term and represent the variance of them as such.
-----Original Message----- From: Andrew Robinson [mailto:A.Robinson at ms.unimelb.edu.au] Sent: Wednesday, February 06, 2013 6:29 PM To: Doran, Harold Cc: dcarov at gmail.com; r-sig-mixed-models at r-project.org; Ben Bolker Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals Hi Harold, just pragmatism. If the model says that there's a quantity but it's hard to estimate, eg the software doesn't provide it, then I wouild contend that - so long as one is honest - it's reasonable to ignore it. Intuitively I would expect that the correlation between the fixed estimates and random predictions would be negative. This is because, in a sense, they're trying to rerepsent the same variation. I know that you have a lot of experience and a lot of time thinking about these models. How does that conjecture sound to you? If that were the case then ignoring the correlation between them when computing a prediction interval would be conservative. Best wishes Andrew On Wed, Feb 06, 2013 at 10:36:40AM -0500, Doran, Harold wrote:
Andrew Ignoring the important theoretical question for just a moment on whether
it is sensible to do this, there is a covariance term between the BLUPs and the fixed effects.
If that term exists under the model, but it is ignored for purposes of
variance estimation, what would be the reason for doing so?
-----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed- models-bounces at r-project.org] On Behalf Of Andrew Robinson Sent: Tuesday, February 05, 2013 9:15 PM To: dcarov at gmail.com Cc: r-sig-mixed-models at r-project.org; Ben Bolker Subject: Re: [R-sig-ME] [R] lmer - BLUP prediction intervals I think that it is a reasonable way to proceed just so long as you interpret the intervals guardedly and document your assumptions
carefully.
Cheers Andrew On Wednesday, February 6, 2013, Daniel Caro wrote:
Dear all I have not been able to follow the discussion. But I would like to know if it makes sense to calculate prediction intervals like this: var(fixed effect+random effect)= var(fixed effect) + var(random effect) + 0 (i.e., the cov is zero) and based on this create the prediction intervals. Does this make
sense?
All the best, Daniel On Tue, Feb 5, 2013 at 8:54 PM, Douglas Bates <bates at stat.wisc.edu>
wrote:
On Tue, Feb 5, 2013 at 2:14 PM, Andrew Robinson < A.Robinson at ms.unimelb.edu.au> wrote:
I'd have thought that the joint correlation matrix would be of the estimates of the fixed effects and the random effects, rather than the things themselves.
Well, it may be because I have turned into a grumpy old man but I get
picky
about terminology and the random effects are not parameters - they are unobserved random variables. They don't have "estimates" in the sense of parameter estimates. The quantities returned by the ranef function are
the
conditional means (in the case of a linear mixed model, conditional modes in general) of the random effects given the observed data evaluated with the parameters at their estimated values. In the Bayesian point of view none of this is problematic because they're all random variables but otherwise I struggle with the interpretation of how these can be
considered
jointly. If you want to consider the distribution of the random effects you need to have known values of the parameters.
The estimates are statistical quantities, with specified distributions, under the model. The model posits these different roles (parameter,
random
variable) for the quantities that are the targets of the estimates, but
the
estimates are just estimates, and as such, they have a correlation structure under the model, and that correlation structure can be
estimated.
An imperfect analogy from least-squares regression is the correlation structure of residual estimates, induced by the model. We say that the errors are independent, but the model creates a (modest) correlation structure than can be measured, again, conditional
on the model.
Well the residuals are random variables and we can show that at the least squares estimates of the parameters they will have a known Gaussian distribution which, it turns out, doesn't depend on the values of the coefficients. But those are the easy cases. In the linear mixed model
we
still have a Gaussian distribution and a linear predictor but that is for the conditional distribution of the response given the random
effects.
For
the complete model things get much messier. I'm not making these points just to be difficult. I have spent a lot of time thinking about these models and trying to come up with a coherent
way
of describing them. Along the way I have come to the conclusion that the way these models are often described is, well, wrong. And those descriptions include some that I have written. For example, you often
see
the model described as the linear predictor for an observation plus a "noise" term, epsilon, and the statement that the distribution of the random effects is independent of the distribution of the noise term. I
now
view the linear predictor as a part of the conditional distribution of
the
response given the random effects so it wouldn't make sense to talk about these distributions being independent. The biggest pitfall in
transferring
your thinking from a linear model to any other kind (GLM, LMM, GLMM) is
the
fact that we can make sense of a Gaussian distribution minus its mean so
we
write the linear model in the "signal plus noise" form as Y = X\beta + \epsilon where Y is an n-dimensional random variable, X is the + n by p
model
matrix, \beta is the p-dimensional vector of coefficients and \epsilon is an n-dimensional Gaussian with mean zero. That doesn't work in the other cases, despite the heroic attempts of many people to write things in that way. Here endeth the sermon.
On Wed, Feb 6, 2013 at 6:34 AM, Douglas Bates <bates at stat.wisc.edu>
wrote:
It is possible to create the correlation matrix of the fixed effects
and
the random effects jointly using the results from lmer but I have difficulty deciding what this would represent statistically. If you
adopt
a B
-- Andrew Robinson Director (A/g), ACERA Senior Lecturer in Applied Statistics Tel: +61-3-8344-6410 Department of Mathematics and Statistics Fax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: a.robinson at ms.unimelb.edu.au Website: http://www.ms.unimelb.edu.au FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/ SPuR: http://www.ms.unimelb.edu.au/spuRs/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Andrew Robinson Director (A/g), ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/
Andrew Robinson Director (A/g), ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/