Computational speed - MCMCglmm/lmer
On Tue, Jun 22, 2010 at 3:06 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
On Sat, Jun 19, 2010 at 10:42 AM, David Atkins <datkins at u.washington.edu> wrote:
Hi all-- I use (g)lmer and MCMCglmm on a weekly basis, and I am wondering about options for speeding up their computations. ?This is primarily an issue with MCMCglmm, given the many necessary MCMC iterations to get to convergence on some problems. ?But, even with glmer(), I have runs that get into 20-30 minutes. 3. "Optimized" BLAS: There's a bit of discussion about optimized BLAS (basis linear algebra... something). ?However, these discussions note that there is no generally superior BLAS. ?Not sure whether specific BLAS might be optimized for GLMM computations. 4. Parallel computing: With multi-core computers, looks like there are some avenues for splitting intensive computations across processors.
Hi, Dave: I've wondered this same thing. I replaced the base R BLAS with GOTOBLAS2 and ATLAS and both are much faster than R's base BLAS. ?In Gotoblas2, computation is about 10 x faster on linear algebra problems, especially on the kinds of problems where it can ?thread computations across all cores. ?The BLAS library from Atlas does not seem to thread, so it is not quite so fast. In either case, I've tested your example on this Lenovo T61 laptop with dual core Pentium that maxes out at 2.4GHz, To calculate your model with the base R BLAS: drk.glmer ?user ?system elapsed ?29.920 ? 0.120 ?30.245 The time elapsed with the optimized BLAS is not so much faster as I had expected. With Atlas it is: ? user ?system elapsed ?25.660 ? 0.100 ?25.784 Gotoblas2 is almost identical, I'm quite surprised. ?On other tests I've done, it supplies a more noticeable speedup because it can go multi core when needed. ?I was monitoring the CPU and the calculations all stay on one core. ? user ?system elapsed ?25.670 ? 0.050 ?25.725 Well, if you use Atlas or GOTOBLAS2, you can expect a speedup of about 1/6th.
On this particular model/data set combination. Accelarated BLAS change the speed of low-level numerical linear algebra operations, the so-called basic linear algebra subroutines. If those are the bottleneck in your calculation you will see a performance boost. If it is not, you won't. Accelerated BLAS are not a panacea. Neither is parallel computation. When your computation if essentially single-threaded, as an optimization like this is, it doesn't matter if you have one core or twelve. The basic rule of optimizing performance of programs is to profile *before* you make changes. Making great efforts to optimize an operation that takes only 5% of the execution time will provide you with at most a 5% gain in performance.
I made the mistake of running that example with MCMCglmm in your code. ?The system is locked in mortal combat with that. ?I didn't notice your time was 1208. before I started that one. ?:(
Forgive me for sounding grouchy but I find this whole discussion misguided. Worrying about the speed of fitting a model and niceties of the model formulation before doing elementary checks on the data is putting the cart before the horse. Why is gender coded as 0, 1 and 2? Why, when there was a maximum of 90 days of monitoring, is there one id with 435 observations and another with 180 observations. Did someone really have 45 drinks in one day and, if so, are they still alive? Accelerated BLAS and parallel algorithms are way, way down the list of issues that should be addressed.