Computational speed - MCMCglmm/lmer

Wed, Jun 23, 2010 8:36 AM

On Tue, Jun 22, 2010 at 3:06 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:

On Sat, Jun 19, 2010 at 10:42 AM, David Atkins <datkins at u.washington.edu> wrote:

Hi all--

I use (g)lmer and MCMCglmm on a weekly basis, and I am wondering about
options for speeding up their computations. ?This is primarily an issue with
MCMCglmm, given the many necessary MCMC iterations to get to convergence on
some problems. ?But, even with glmer(), I have runs that get into 20-30
minutes.

3. "Optimized" BLAS: There's a bit of discussion about optimized BLAS (basis
linear algebra... something). ?However, these discussions note that there is
no generally superior BLAS. ?Not sure whether specific BLAS might be
optimized for GLMM computations.

4. Parallel computing: With multi-core computers, looks like there are some
avenues for splitting intensive computations across processors.

Hi, Dave:

I've wondered this same thing. I replaced the base R BLAS with
GOTOBLAS2 and ATLAS and both are much faster than R's base BLAS. ?In
Gotoblas2, computation is about 10 x faster on linear algebra
problems, especially on the kinds of problems where it can ?thread
computations across all cores. ?The BLAS library from Atlas does not
seem to thread, so it is not quite so fast.

In either case, I've tested your example on this Lenovo T61 laptop
with dual core Pentium that maxes out at 2.4GHz,

To calculate your model with the base R BLAS:

drk.glmer

?user ?system elapsed
?29.920 ? 0.120 ?30.245


The time elapsed with the optimized BLAS is not so much faster as I
had expected. With Atlas it is:

? user ?system elapsed
?25.660 ? 0.100 ?25.784

Gotoblas2 is almost identical, I'm quite surprised. ?On other tests
I've done, it supplies a more noticeable speedup because it can go
multi core when needed. ?I was monitoring the CPU and the calculations
all stay on one core.

? user ?system elapsed
?25.670 ? 0.050 ?25.725

Well, if you use Atlas or GOTOBLAS2, you can expect a speedup of about 1/6th.

On this particular model/data set combination.  Accelarated BLAS
change the speed of low-level numerical linear algebra operations, the
so-called basic linear algebra subroutines.  If those are the
bottleneck in your calculation you will see a performance boost.  If
it is not, you won't.

Accelerated BLAS are not a panacea.  Neither is parallel computation.
When your computation if essentially single-threaded, as an
optimization like this is, it doesn't matter if you have one core or
twelve.

The basic rule of optimizing performance of programs is to profile
*before* you make changes.  Making great efforts to optimize an
operation that takes only 5% of the execution time will provide you
with at most a 5% gain in performance.

Forgive me for sounding grouchy but I find this whole discussion
misguided.  Worrying about the speed of fitting a model and niceties
of the model formulation before doing elementary checks on the data is
putting the cart before the horse.  Why is gender coded as 0, 1 and 2?
 Why, when there was a maximum of 90 days of monitoring, is there one
id with 435 observations and another with 180 observations.  Did
someone really have 45 drinks in one day and, if so, are they still
alive?  Accelerated BLAS and parallel algorithms are way, way down the
list of issues that should be addressed.

Computational speed - MCMCglmm/lmer

Thread (9 messages)