Negative response values when simulating glmer with log link
(I never answered the question on the lme4 issues list: I will
answer here, and copy the information to the issues list.)
There are two ways one might define a "log normal GLMM": (1) with a
transformation
eta = a + b*x + ... (linear predictor)
log(y) ~ Normal(eta, sigma^2)
or (2) with a link function:
eta = a + b*x + ... (same as above)
y ~ Normal(exp(eta), sigma^2)
These look almost identical, but are quite different.
The first case is equivalent to
Y ~ log-Normal(meanlog = eta, meansd = sigma)
[using R's parameterization based on the mean and standard deviation *on
the log scale*]. In this case:
* simulated values of log(y) can be any real number, but y =
exp(log(y)) will always be positive (possibly zero due to floating point
underflow in extreme cases
* the standard deviation of Y is proportional to its mean (== the
coefficient of variation is constant)
In the second case,
* simulated values of y can be any real number: could easily be
negative, for example, if exp(eta) is close to zero and sigma is not too
small
* the standard deviation of Y is constant
Although there are use cases for both models, I would say that case 1
(transformation) is generally a more natural way to model positive,
continuous data.
Does that help?
On 2025-04-29 2:50 a.m., Fiona Scarff wrote:
I have some data in which the response variable can only be a non-negative number. I fitted a log normal glmm using the lme4 package, and simulated from the model using simulate.merMod. A very small proportion of the simulated values are slightly negative, and I would like to understand how that is possible with a log link. I found a post in which Ben Bolker observed that: "Note that if you did simulate data with a log link and a Gaussian family, you could still get negative values if the standard deviation were large enough ..." https://github.com/lme4/lme4/issues/530 I thought that the log link would force all the reponses to be non-negative. It is not especially important in this particular case, but I feel I have misunderstood something, either about the way that simulate() works for mixed effects models, or perhaps something more fundamental about how random effects work in a model with a non-identity link. Apologies therefore if this question is misdirected and ought instead to go to crossvalidated. Many thanks for your help, Fiona *Dr Fiona Scarff* *Harry Butler Institute* *Murdoch University* [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering > E-mail is sent at my convenience; I don't expect replies outside of working hours.