I'm using lmer with a generalized linear (binomial) mixed model with nested random effects, like: y ~ (1 | a / b / c) There are no fixed effects. After fitting the model, I would like to make predictions for a new set of y values: specifically, I want to predict BLUPs for the random effects, and I would like to compute likelihoods for sets of y values under the fitted model. I don't see a completely straightforward way of doing this since it isn't the usual sort of prediction problem. Is this even a sensible thing to do? -- David Hinds
A glmm prediction problem
3 messages · Douglas Bates, David Hinds
On Jan 30, 2008 2:02 PM, David Hinds <David_Hinds at perlegen.com> wrote:
I'm using lmer with a generalized linear (binomial) mixed model with nested random effects, like:
y ~ (1 | a / b / c)
There are no fixed effects.
I would check that. The model will include a constant term in the fixed effects. I don't think it is possible in the current formulation to fit a model without any fixed effects. There is nothing in the theory or computational methods that would preclude that but I be hard pressed to think of a situation where it would be sensible to do so. The distribution of the random effects assumes a mean of zero for all the random effects terms. If you don't have any fixed effects terms then you are assuming that the population mean probability of success is exactly 0.5, which seems like a pretty strong assumption.
After fitting the model, I would like to make predictions for a new set of y values: specifically, I want to predict BLUPs for the random effects, and I would like to compute likelihoods for sets of y values under the fitted model.
Are the new y values to be at some set of observed levels for the a, b and c factors? For example, suppose that factor a is the field, factor b is the plant selected from the field, factor c is the seed pod selected from the plant and the observational unit is the seed within the seed pod. Do you want to predict for another seed from that particular seed pod or for a seed in another, as yet unobserved, seed pod from that plant or for a seed from a seed pod from another, as yet unobserved, plant from one of the fields you observed or ... Essentially what will happen is that you will need to use the marginal variance for units that are as yet unobserved and the conditional variance for the units that have been observed when determining the variance of the linear predictor. Because you have a generalized linear mixed model you need to convert the variance of the linear predictor to an associated interval on the mean response through the inverse link function. Somewhere along the line what would be the "residual variance" in a model for a Gaussian family would need to be incorporated for the binomial family. I'm not sure exactly how that would be done. Suffice it to say that formulating the prediction interval would not be trivial.
I don't see a completely straightforward way of doing this since it isn't the usual sort of prediction problem. Is this even a sensible thing to do? -- David Hinds
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
A couple clarifications: - It is a little bit of a digression, but for the data I'm looking at, there really are no fixed effects and the population mean probability of success is just 0.5: any deviation from that belongs in a random effect. I can't fit that model directly, so you're right, I do end up with an estimate for a fixed intercept term, which is always very close to zero. For many purposes it can be ignored, because it accounts for so little variance compared to the fitted random effects. (the data consists of experimental estimates of the proportion of each allele at genetic markers where an individual is known to have an "AB" genotype, so we know with certainty that the underlying ratio is 1:1. I'm using random effects to model sources of variance in the experiment that cause overdispersion in the observed ratios.) - Using your terminology, I want to make predictions for y values for a previously unseen field (new value of 'a'). I will essentially always have data for two seed pods (two values of 'c'), and may have several values of 'b' available, but often just one. I was thinking to get a likelihood for a new set of y/a/b/c values by numerically integrating p(random effects)*p(y|random effects) over the fitted multivariate normal distribution of random effects. I could estimate marginal distributions for the individual random effects at the same time. This seems manageable for the important case of one new field, one plant, two pods (integrating over 4 random effects) but gets out of hand when there are multiple plants. Alternatively it seems that 'lmer' should be able to do the work for me if I could prespecify a model. I found a hint on how to lock down the random effects from an archived posting on this list (via the start= and control= parameters) but the intercept would be a problem. -- Dave -----Original Message----- From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas Bates Sent: Wednesday, January 30, 2008 12:36 PM To: David Hinds Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] A glmm prediction problem
On Jan 30, 2008 2:02 PM, David Hinds <David_Hinds at perlegen.com> wrote:
I'm using lmer with a generalized linear (binomial) mixed model with nested random effects, like:
y ~ (1 | a / b / c)
There are no fixed effects.
I would check that. The model will include a constant term in the fixed effects. I don't think it is possible in the current formulation to fit a model without any fixed effects. There is nothing in the theory or computational methods that would preclude that but I be hard pressed to think of a situation where it would be sensible to do so. The distribution of the random effects assumes a mean of zero for all the random effects terms. If you don't have any fixed effects terms then you are assuming that the population mean probability of success is exactly 0.5, which seems like a pretty strong assumption.
After fitting the model, I would like to make predictions for a new set of y values: specifically, I want to predict BLUPs for the random effects, and I would like to compute likelihoods for
sets
of y values under the fitted model.
Are the new y values to be at some set of observed levels for the a, b and c factors? For example, suppose that factor a is the field, factor b is the plant selected from the field, factor c is the seed pod selected from the plant and the observational unit is the seed within the seed pod. Do you want to predict for another seed from that particular seed pod or for a seed in another, as yet unobserved, seed pod from that plant or for a seed from a seed pod from another, as yet unobserved, plant from one of the fields you observed or ... Essentially what will happen is that you will need to use the marginal variance for units that are as yet unobserved and the conditional variance for the units that have been observed when determining the variance of the linear predictor. Because you have a generalized linear mixed model you need to convert the variance of the linear predictor to an associated interval on the mean response through the inverse link function. Somewhere along the line what would be the "residual variance" in a model for a Gaussian family would need to be incorporated for the binomial family. I'm not sure exactly how that would be done. Suffice it to say that formulating the prediction interval would not be trivial.
I don't see a completely straightforward way of doing this since it isn't the usual sort of prediction problem. Is this even a sensible thing
to
do? -- David Hinds
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models