Skip to content

recover simulated individual-level random effects with glmer?

3 messages · Ian Carroll, Ben Bolker, Thierry Onkelinx

#
I raised this question yesterday on Cross Validated (
stats.stackexchange.com/q/158043/43122), but now realize this might be the
more appropriate forum for help. Please excuse the cross-post if you track
both.

I simulated count data with an observation level random effect, then fit a
Poisson-family GLMM using glmer (lme4 version 1.1-7). The fitted random
intercepts show a strange pattern when compared to the simulated data, and
produce a kinked QQ plot. If this is expected, why is it okay? If not, what
am I doing wrong?

Simulated data and fitting:
Questioned results:
My results (posted on CV) are two overlapping clusters of points: in one r
and r.est show the expected correlation, in the other there is no
correlation. The normal QQ plot makes it look like different normal
distributions are obtained for positive and negative intercepts.

Thanks,
Ian
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15-06-22 11:36 AM, Ian Carroll wrote:
I think this is inevitable.  If you think about it, the distribution
of the *conditional* modes of the observations with y=0 must be
different from that of the conditional modes of the obs. with y>0.
This is much easier to see with an even more trivial model that
doesn't include a covariate (see below ...)

  I don't think actually violates any of the assumptions we're using.
If you're worried about its effects on estimation & inference you
could do some simulation examples to see if it affects whatever it is
you're interested in (type I error rate, bias/variance of parameter
estimation, etc.)

library(lme4)
library(plyr) ## for mutate
set.seed(101)
n <- 1000
df <- data.frame(obs=factor(1:n),x=rnorm(n))
df <- mutate(df,
             r=rnorm(n, sd=3),
             y=rpois(n=n, lambda=exp(2*x + r)),
             y2 = rpois(n=n,lambda=exp(r)))


glmer.fit <- glmer(y ~ x + (1|obs), family='poisson', data=df)
glmer.fit2 <- update(glmer.fit,nAGQ=20)
glmer.fit3 <- update(glmer.fit, y2 ~ 1 + (1|obs))

## df$r.est <- ranef(glmer.fit2)$obs[ , '(Intercept)']
df$r.est <- ranef(glmer.fit3)$obs[ , '(Intercept)']
cvec <- c("#000000","#FF000080")
cvec2 <- cvec[as.numeric(df$y2==0)+1]
plot(df$r, df$r.est,col=cvec2)

qqnorm(df$r.est,col=cvec[as.numeric(df$y2==0)+1])

df$y2cat <- cut(df$y2,right=FALSE,include.lowest=TRUE,
                    breaks=c(0:6,Inf))
library("ggplot2"); theme_set(theme_bw())
ggplot(df,aes(r.est,fill=y))+
    geom_histogram(alpha=0.5,position="identity",
                   aes(y = ..density..))
ggplot(df,aes(r.est,fill=ycat))+
    geom_density(alpha=0.5,position="identity")



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJViFhMAAoJEOCV5YRblxUH8gMIAMiLYfW2C57AYTyrVjb660wg
uvAkJGQjZ9ORIdm/MqDRydKTuNV16r3WfwSPXWCdLDBJjpdDq2w3dO5CxsbYjOH+
q7L0NWdX0KRjb1oKF2muLbz38dvH6Z4ExjOmJlW3funuVzCVvzhIGV7N/iIMqp5D
yTnK2y/YeVySuO91MqBgbKNtH/mGTfH1iGMbmhrssF1y8EfeWrw6RF7/uti8vhWE
cvAUDtkz74Q1gfJVyeCRPVRj7ADRb6tN7FuWbqukfEoK5etqYkO5QRIv6BLR4qoU
Hs+zPCymno6JZZDpY/D1UG5HChyXJZjFN6Xj7e+gnPtAvBUa5UTIxfy+z1cblT4=
=Ygc1
-----END PGP SIGNATURE-----
#
Dear Ian,

Just a little remark on the variance of the observation level random
effect (ORLE). sd = 3 leads to a huge effect.

- We assume ORLE ~ N(0, sigma)
- Let's look at the 2.5% quantile for small values and the 97.5% for
high values.
- Hence the 2.5% and 97.5% quantiles are low = -1.96 * sigma and high
= 1.96 * sigma.
- high - low = 3.912 * sigma
- The ORLE is estimated on the log-scale. Backtransforming the
difference give the ratio between both quantiles:
- log(high - low) = log(high/low) = 3.912* sigma
- high / low = exp(3.912 * sigma)

The last expression yields 127999 when sigma = 3. So high = 127999 * low!!!!

Bottom-line: be very, very, very careful when you use an OLRE when its
variance is large. The tricky part is that even low numbers like sigma
= 3 trigger huge effects.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey


2015-06-22 20:47 GMT+02:00 Ben Bolker <bbolker at gmail.com>: