+ {
+ id <- factor(id)
+ gender <- factor(gender)
+ weekday <- factor(weekday, labels = c("N","Y"))
+ })
'data.frame': 57318 obs. of 4 variables:
$ id : Factor w/ 990 levels "1014503","1017782",..: 1 1 1 2 2 2 2 2 2 2 ...
$ gender : Factor w/ 3 levels "0","1","2": 1 1 1 2 2 2 2 2 2 2 ...
$ weekday: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 2 2 2 2 ...
$ drinks : int 0 0 5 8 5 10 0 0 0 0 ...
id gender weekday drinks
8396125: 435 0:32445 N:16633 Min. : 0.0000
6551092: 180 1:24780 Y:40685 1st Qu.: 0.0000
1028078: 90 2: 93 Median : 0.0000
1325095: 90 Mean : 0.9983
1365806: 90 3rd Qu.: 0.0000
1411092: 90 Max. :45.0000
(Other):56343
str(subj <- unique(subset(drink.df, select = c(id, gender))))
'data.frame': 990 obs. of 2 variables:
$ id : Factor w/ 990 levels "1014503","1017782",..: 1 2 3 4 5 6 7 8 9 10 ...
$ gender: Factor w/ 3 levels "0","1","2": 1 2 1 1 2 2 1 1 1 2 ...
id gender
1014503: 1 0:549
1017782: 1 1:439
1018311: 1 2: 2
1026066: 1
1028078: 1
1036230: 1
(Other):984
On Sat, Jun 19, 2010 at 10:42 AM, David Atkins <datkins at u.washington.edu> wrote:
Hi all--
I use (g)lmer and MCMCglmm on a weekly basis, and I am wondering about
options for speeding up their computations. This is primarily an issue with
MCMCglmm, given the many necessary MCMC iterations to get to convergence on
some problems. But, even with glmer(), I have runs that get into 20-30
minutes.
First, let me be very clear that this is in no way a criticism of Doug's and
Jarrod's work (package developers for lme4 and MCMCglmm, respectively).
Their code has probably brought models/data into range that would not have
been possible.
Second, I have included link to data and script below, along with some
timings on my computer: Mac Book Pro, 2.5GHz, with 4GB RAM. Here's
sessionInfo from my runs:
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MCMCglmm_2.05 corpcor_1.5.6 ape_2.5-3
[4] coda_0.13-5 tensorA_0.35 lme4_0.999375-34
[7] Matrix_0.999375-41 lattice_0.19-7
loaded via a namespace (and not attached):
[1] gee_4.13-14 grid_2.11.1 nlme_3.1-96 stats4_2.11.1 tools_2.11.1
Specific questions:
1. Would be curious to know timings on other people's set-ups. Jarrod and I
had an exchange one time where he was gracious enough to run a zero-inflated
model where I was concerned about convergence. He ran a model with 1.3M
iterations, which I think would take a number of days, if not a week on my
computer. This was part of what got me thinking about this. Thus, my first
interest is whether there is an "optimal" hardware/OS configuration, or does
it matter?
Some things I see in R-help archives:
2. 32 vs. 64-bit: Seems like this is mostly an issue of data/model size and
whether you need to access more than 4GB of RAM. AFAICS, 64-bit processors
are not necessarily faster.
3. "Optimized" BLAS: There's a bit of discussion about optimized BLAS (basis
linear algebra... something). However, these discussions note that there is
no generally superior BLAS. Not sure whether specific BLAS might be
optimized for GLMM computations.
4. Parallel computing: With multi-core computers, looks like there are some
avenues for splitting intensive computations across processors.
Finally, I'm asking here b/c I run into these issues with GLMM (and
zero-inflated mixed models), though most of the discussion I've seen thus
far about computation speed has been on R-help.
The data below are self-reported drinks (alcohol) from college students for
up to the last 90 days. Distribution of counts is zero-inflated. I run a
Poisson GLMM with glmer, over-dispersed Poisson GLMM with MCMCglmm, and then
zero-inflated OD Poisson with MCMCglmm and provide timings for my set-up.
Okay, any and all thoughts welcomed.
thanks, Dave
### thinking through computational speed with lmer and MCMCglmm
#
### read drinking data
drink.df <- read.table(file =
"http://depts.washington.edu/cshrb/newweb/stats%20documents/tlfb.txt",
header = TRUE, sep = "\t")
str(drink.df) # 57K rows
### id is id variable (shocking)
### gender and weekday are 0/1 indicator variables
### drinks has number of drinks consumed
### distribution of outcome
table(drink.df$drinks)
plot(table(drink.df$drinks), lwd=2) # zero-inflated
### how many people?
length(unique(drink.df$id)) # 990
sort(table(drink.df$id))
### NOTE: most people have max of 90, which they should
### two folks with 180 and 435 (prob data errors)
### long negative tail down from 90
#
### NOTE: for this purpose, not worrying about the 180/435
### speed tests
#
### fit random intercept and random slope for weekday
### fixed effects for gender and weekday and interaction
#
### Poisson GLMM with glmer()
library(lme4)
system.time(
drk.glmer <- glmer(drinks ~ weekday*gender + (weekday | id),
data = drink.df, family = poisson,
verbose = TRUE)
)
summary(drk.glmer)
### timing
#
### user system elapsed
### 36.326 9.013 45.316
### over-dispersed Poisson GLMM with MCMCglmm()
library(MCMCglmm)
prior <- list(R = list(V = 1, n = 1),
G = list(G1 = list(V = diag(2), n = 2)))
system.time(
drk.mcmc <- MCMCglmm(drinks ~ weekday*gender,
random = ~ us(1 + weekday):id,
data = drink.df, family = "poisson",
prior = prior)
)
summary(drk.mcmc) # NOTE: using summary.MCMCglmm in v 2.05 of package
### timing
#
### user system elapsed
### 1034.317 165.128 1203.536
### zero-inflated, over-dispersed Poisson GLMM with MCMCglmm()
#
### NOTE: haven't run the following yet, other than a quick "toy run" to
### sure that it is set up correctly.
### NOTE: this only has random intercept in each portion of the model
prior2 <- list(R = list(V = diag(2), n = 1, fix = 2),
G = list(G1 = list(V = 1, n = 1),
G2 = list(V = 1, n = 1)))
system.time(
drk.zimcmc <- MCMCglmm(drinks ~ -1 + trait*weekday*gender,
random = ~ idh(at.level(trait, 1)):id
+ idh(at.level(trait, 2)):id,
rcov = ~ idh(trait):units,
data = drink.df, family =
"zipoisson",
prior = prior2)
)
summary(drk.zimcmc)
### timing
#
### user system elapsed
### 2105.366 544.881 2640.030
--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
Center for the Study of Health and Risk Behaviors (CSHRB)
1100 NE 45th Street, Suite 300
Seattle, WA 98105
206-616-3879
http://depts.washington.edu/cshrb/
(Mon-Wed)
Center for Healthcare Improvement, for Addictions, Mental Illness,
Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)