Skip to content
Prev 7530 / 20628 Next

LMM with Big data using binary DV

Dear AC (and perhaps Doug),
You may be well aware of this, but one way of substantially speeding up the estimation of models with binary data is to use "cbind", though the feasibility of this depends on the nature of the model (number of predictors and number of unique values for each predictor variable). The kind of code you need for this is below. You'll see that the fixed effects estimates, standard errors, and random effects variances turn out the same.
I've compared the speed of lme4 versus lme4Eigen, on a simulated dataset with 100,000 observations, using a 2GHz MacBook. Based on a handful of simulations, there doesn't appear to be much difference between the two packages in terms of speed (sometimes one is faster, sometimes the other). I have reported the results of one simulation here. The two packages generate identical results for this dataset.

Cheers,
Malcolm


N <- 100000
grps <- 100
dat <- data.frame(x1 = sample(1:10, N, replace=T), x2 = sample(18:23, N, replace=T), grp=rep(1:grps, each=N/grps))
dat$y <- rbinom(N, prob = plogis(-5 + 0.1*dat$x1 + 0.2*dat$x2 + rnorm(grps)[dat$grp]), size = 1)
failures <- by(dat, list(dat$x1, dat$x2, dat$grp), function(x) sum(x$y==0))
successes <- by(dat, list(dat$x1, dat$x2, dat$grp), function(x) sum(x$y==1))
dat2 <- expand.grid(x1=sort(unique(dat$x1)), x2=sort(unique(dat$x2)), grp=sort(unique(dat$grp)))
dat2$failures <- as.vector(failures)
dat2$successes <- as.vector(successes)
library(lme4)
system.time(glmer(y ~ x1 + x2 + (1 | grp), dat, family=binomial))
#   user  system elapsed 
# 22.918   0.660  24.441
system.time(glmer(cbind(successes, failures) ~ x1 + x2 + (1 | grp), dat2, family=binomial))
#   user  system elapsed 
#  1.833   0.017   1.855 
detach("package:lme4")
library(lme4Eigen)
system.time(glmer(y ~ x1 + x2 + (1 | grp), dat, family=binomial))
#   user  system elapsed 
# 24.824   1.811  26.773 
system.time(glmer(cbind(successes, failures) ~ x1 + x2 + (1 | grp), dat2, family=binomial))
#   user  system elapsed 
#  1.687   0.039   1.723