Skip to content
Prev 7525 / 15075 Next

using optimized BLAS [Was: problem installing rjags package]

OK, here's what I did, and it seems to work:

1.  I started with a clean R installation, using only the CRAN binary.

2.  I ran the following script:

#!/bin/bash

export BL=/opt/intel/Compiler/11.1/089/Frameworks/mkl/Libraries/em64t
export GF=~/Install_Files/MKL_gfortran_interface/lib/em64t
export IOMP=/opt/intel/Compiler/11.1/089/lib

ld -dylib -arch x86_64 -L$BL -L$GF -reexport-lmkl_intel_lp64 -reexport-lmkl_sequential -reexport-lmkl_core -reexport-lpthread -o libmklblas_seq.dylib
ld -dylib -arch x86_64 -L$BL -L$GF -L$IOMP  -reexport-lmkl_intel_lp64 -reexport-lmkl_intel_thread  -reexport-lmkl_core -reexport-liomp5 -reexport-lpthread -o libmklblas_par.dylib

This gave me two Intel MKL libraries.  I then pointed the libRblas.dylib symbolic link to one or the other, depending on if I want a sequential or threaded.
The performance of the threaded MKL is comparable to ATLAS (sometimes better, sometimes worse), but is much, much better than vecLib.  

So this gives me the BLAS I want, and keeps my life very simple by letting me use the CRAN binary.

But as researchers, we know that every answer spawns new questions.

1.  Can I do something similar with LAPACK (say, if I want a threaded Cholesky decomposition or solving a system of equations)?

2.  Although it is true that one can adjust the number of threads that the MKL BLAS uses by setting the MKL_NUM_THREADS environment variable, it appears that this variable must be set before the BLAS is loaded.  Since BLAS is loaded at the same time as R, it would appear that this kind of adaptation "on the fly" is not possible.  Is there a way to "reset" the BLAS at a suitable point in an R script, so I could unload BLAS, change the variable, and then reload it? 

The *specific* case I'm talking about is what would happen if I were to call multithreaded BLAS from a function that is running via multicore, foreach and/or plyr?  I'm thinking I want threading turned on when I'm not in a foreach loop, but off when I am.

3.  Do I have to worry about the MKL Fortran interfaces to BLAS or LAPACK at all?  (I'm talking about the libmkl_blas95_lp64.a and libmkl_lapack95_lp64.a libraries).  I needed to link to them to compile R from source, but if I omit them from today's new creation, will that mess me up down the road?

4.  As I mentioned before, I'm learning a lot this as I go, but from what I've read, the newer Intel processors support a number of vectorization instructions that are not supported by earlier chips (this SSE4.2 stuff).  I have tried compiling other packages (e.g., GSL) with and without SSE4.2, and find that adding the appropriate flag in the Intel compiler gives me 10-20 percent improvement in some applications (can't remember which--did it a while ago).  So that's why I add those flags when I compile R from source.  But I recognize that the CRAN binary needs to have some backwards compatibility.  The Intel compilers offer such an option (e.g., -axsse4.2 will compile to use the best possible set of instructions up to sse4.2).  Is the CRAN binary compiled in the same way? Put another way, is it compiled for the "lowest" common instruction set, or will it use the most advanced instruction set if it's available?

As before, I am happy to split these questions into separate discussion threads.  But now the questions are more of the "this is interesting--I'd like to learn more," variety, instead of the "I have a problem, please help me" type.  And of course, thanks so much for pointing me in the right direction, and for all the work you do in developing and supporting R.
On Oct 12, 2010, at 11:44 AM, Simon Urbanek wrote:

            
-------------------------------------------
Michael Braun
Homer A. Burnell (1928) Career Development Professor, 
	and Assistant Professor of Management Science (Marketing Group)
MIT Sloan School of Management
100 Main St.., E62-535
Cambridge, MA 02139
braunm at mit.edu
617-253-3436