Matrix multiplication
On 12-03-13 12:50 PM, Brian G. Peterson wrote:
On Tue, 2012-03-13 at 12:40 -0400, Paul Gilbert wrote:
Brian Thanks for spelling this out for those of us that are a bit slow. (Newbie questions below)
<... snip ...>
So, if your BLAS does multithreaded matrix multiplication, it will use multiple threads 'implicitly', as Simon pointed out.
Is there an easy way to know if the R I am using has been compiled with multi-thread BLAS support?
BLAS should be 'plug and play', as R is usually compiled to use a shared object BLAS. As such, installing the BLAS on your machine (and appropriately configuring it) should 'just work' with te new BLAS when you restart R. Dirk et. al. wrote a paper, now a bit dated, that benchmarked some of the BLAS libraries, that should have some additional details.
(I have a long history of getting things that should 'just work' to 'just not work'.) But I didn't really state my question very well. I'm really wondering about two related situations. How can I confirm after a change to underlying system that R is using the new configuration, and second, if I am running benchmarks in R is there an easy way to record the underlying configuration that is being used. Thanks again, Paul
<...snip...>
Be aware that there can be unintended negative interactions between implicit and explicit parallelization. On cluster nodes I tend to configure the BLAS to use only one thread to avoid resource contention when all cores are doing explicit parallelization.
How do you do this? Does it need to be done when you are compiling R, or can it be done on the fly while running R processes?
Some BLAS, like gotoblas, support an environment variable to change the
number of cores to be used. This can be changed at run-time. Others,
like the mkl, are always multithreaded. Others, like ATLAS, can be
compiled in either single threaded or multi-threaded modes.
So, for me, on my cluster nodes, I use a single threaded BLAS, assuming
that *explicit* parallelization will be the primary driver of CPU load,
and not wanting to over-commit the processor when 12 calculations each
try to spawn 12 threads in the BLAS. On other machines, I might use a
multithreaded BLAS like gotoblas so that I have some flexibility (though
apparently unlike Claudia, I rarely change it in practice).
Regards,
- Brian