fixed effects estimates too small in binomial GLMM with low "success"-rate - R-SIG-mixed-models

Sat, May 17, 2014 3:07 PM #

Dear list,

I would really appreciate your input on an issue in which the fixed effects estimates of a binomial GLMM with relatively low rate of "successes" are too low (compared to the observed effect) by around .5%.

The data comes from six comparable experiments in which we measured participants' performance errors on a series of trials in a design with one factor with two levels ("a" and "b"). Overall the difference in error rate between the two levels was very small (~0.5%), but when analyzing all data (with participant and experiment as random effects) the effect of factor reached significance (p = .03), which is also consistent with other published data.

However, the estimated effects are around .5% lower than the observed mean error rates indicating that the model may perhaps not adequately describe the data. Furthermore, I also get a convergence warning ("Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)").

####### code 1 ##########
require(lme4)
dat <- read.table("http://pastebin.com/raw.php?i=KWjD5EG3")
dat$experiment <- factor(dat$experiment)

# observed effects
aggregate(errors ~ factor, dat, mean) #unweighted
##   factor     errors
## 1      a 0.02266914
## 2      b 0.02772540
by(dat, dat$factor, function(x) with(x, sum(errors*weights)/sum(weights))) #weighted
## dat$factor: a
## [1] 0.0222343
## ------------------------------------------------------------
## dat$factor: b
## [1] 0.02777651

m1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), dat, weights = dat$weights, family = binomial)
## Warnmeldung:
## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)
coef(summary(m1))[2,, drop = FALSE]
##          Estimate Std. Error  z value   Pr(>|z|)
## factorb 0.1788881 0.08005533 2.234556 0.02544653

# estimated effect:
logistic <- function(x) 1/(1+exp(-x))
logistic(cumsum(fixef(m1)))
## (Intercept)     factorb
##  0.01825229  0.02174991

####### end code 1 ##########

Interestingly, I can replicate this behavior when simulating binomial data similar to the one I am having. When the overall mean is relatively small, the estimates from the GLMM are consistently too low. When increasing the overall mean, this consistent bias in the estimated effect seems to go away (and the observed effect gets larger due to the non-linear nature of the logistic model). The code of the simulation for one data set follows below.

My question is: Does anyone have an idea what I could do to reach a better agreement between observed and predicted values? I tried different link functions, but that didn't help. And I am unsure whether I can trust my model given this discrepancy.

Thanks a lot,
Henrik


##### code 2 #####

mu <- -3.7  # when larger, problem disappears
n_experiments <- 6
n_participants <- 20
n_trials <- 400
sigma_effect <- 0.2
effect_size <- 0.4
sigma_p <- 0.7
sigma_e <- 0.1
prob_level <- 0.5

# create data:
df <- expand.grid(experiment = factor(seq(1, n_experiments)), id = factor(seq(1, n_participants)), factor = c(-1, 1))
df$id <- with(df, experiment:id)
df <- merge(df, data.frame(experiment = seq(1, n_experiments), re_e = rnorm(n_experiments, sd = sigma_e)))
df <- merge(df, data.frame(id = levels(df$id), re_p = rnorm(length(levels(df$id)), sd = sigma_p)))
df <- merge(df, data.frame(id = levels(df$id), re_a = rnorm(length(levels(df$id)), sd = sigma_effect)))
df <- with(df, df[order(id, factor),])
tmp <- rbinom(n_experiments*n_participants, n_trials, prob_level)
tmp <- rep(tmp, each = 2)
tmp[c(FALSE, TRUE)] <- n_trials-tmp[c(FALSE, TRUE)]
df$errors <- mapply(function(x, y) rbinom(1, y, x), with(df, logistic(mu+re_p+(factor*(effect_size/2)+factor*re_a)+re_e)), tmp)/tmp
df$weights <- tmp
df$factor <- factor(df$factor)

aggregate(errors ~ factor, df, mean) #unweighted
##   factor     errors
## 1     -1 0.02352719
## 2      1 0.03718523

s1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), df, weights = df$weights, family = binomial)
## Warnmeldung:
## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 0.0130648 (tol = 0.001)
coef(summary(s1))[2,, drop = FALSE]
#          Estimate Std. Error  z value     Pr(>|z|)
# factor1 0.4765694 0.07578773 6.288213 3.211416e-10

logistic(cumsum(fixef(s1)))
# (Intercept)     factor1
#  0.01849934  0.02946117

#### end code 2 ####

Dr. Henrik Singmann
Albert-Ludwigs-Universit?t Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann

Ben Bolker

Sat, May 17, 2014 3:19 PM #

On 14-05-17 06:07 PM, Henrik Singmann wrote:

Don't know, but one comment and one thought:

  (1) the convergence warning goes away with the most recent version
of lme4 (1.1-7)

  (2) I'm _guessing_ that you will find that this phenomenon is
due to Jensen's effect (i.e., a nonlinear averaging effect) --
that would mean its magnitude should be proportional to the
random-effects variances and to the strength of the nonlinear
(inverse link) relationship.
  (I think there was a similar, possibly neglected, question along
these lines on the mailing list earlier ...)

  Ben Bolker

Henrik Singmann

Sat, May 17, 2014 3:38 PM #

Am 18.05.2014 00:19, schrieb Ben Bolker:

This is true, thanks.

Thanks for the pointer. In fact, when entering experiment as fixed effect (which removes the random effect with the smallest variance), the estimates are way more precise. They are almost perfect. Perhaps I should then use this approach (although I would prefer treating experiment as random). Or what would you do?

Dr. Henrik Singmann
Albert-Ludwigs-Universit?t Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann