Trying to change OPENBLAS_NUM_THREADS from within R

Sat, Jun 30, 2012 6:37 AM

Hello,

I posted a couple of weeks ago about trying to change
OPENBLAS_NUM_THREADS from within R. I was on holidays since and have
not made much progress.

To clarify my problem, I am aware that many people have had issues
combining implicit and explicit parallelization, and that there are
useful discussions on this issue already posted on the mailing list;
however I am experiencing performance problems with OPENBLAS for
functions that make no use of explicit parallelization.

The problem arises for me in cases where a function that uses BLAS is
called in an apply statement.

While any one such operation speeds up with an increase in
OPENBLAS_NUM_THREADS, this is not the case over all when e.g. under
lapply.

I have some basic results for a crossprod below - individual
operations improve with added threads, but under lapply it flattens
out, and htop shows a lot of red bars at work on the processors
(indicating heavy system usage?). With other more complex functions
the performance deteriorates further and it is positively undesirable
to use more than 1 thread for OPENBLAS, but the results below should
be sufficient to illustrate the issue.

It is because of this specific problem that I want to be able to
control the number of threads at runtime, and hence other previously
explained approaches are not applicable, at least as far as I can
tell. But any other suggestions on how to overcome the problem would
be very welcome.

I intend to test on different systems and BLAS, but I was wondering
whether anyone had encountered this kind of problem before and if
there is a workaround, whether it is OPENBLAS specific, or indeed
whether it is specific to my system or processor (i7-2630QM).

Thank you in advance. A basic illustration of the issue follows.

Simon

CS Dept
NUIM

func2 <- function(x){
   qq <- matrix(rnorm(250000),500,500)
   return( crossprod(x,qq) )
}

li<-list()
for(i in 1:500){
   li[[i]] <- matrix(rnorm(250000),500,500)
}

#OPENBLAS_NUM_THREADS=1
 microbenchmark(lapply(li,func2),times=10)
 Unit: seconds
              expr      min       lq   median       uq     max
1 lapply(li, func2) 23.04498 23.05062 23.07796 23.39843 32.7897

microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 22.87143 23.32093 23.45628 24.34411 26.13419


#OPENBLAS_NUM_THREADS=2

 microbenchmark(lapply(li,func2),times=10)
Unit: seconds
              expr      min       lq  median       uq      max
1 lapply(li, func2) 20.95075 22.29843 23.2581 23.71557 24.21765

## Clearly BLAS improves performance, but not with lapply

microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 12.23434 13.25925 14.12305 14.54331 19.47039



#OPENBLAS_NUM_THREADS=4

microbenchmark(lapply(li,func2),times=10)
Unit: seconds
              expr     min       lq   median       uq      max
1 lapply(li, func2) 19.0154 20.17587 22.27971 23.56876 24.40342


microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 7.301697 8.105116 8.346089 8.670551 10.41987

On Mon, Jun 11, 2012 at 4:17 PM, Simon Fuller <simonfuller9 at gmail.com> wrote:

Hello,

I hope this is the relevant mailing list for my enquiry.

I have googled the above point but have not been able to find a
solution which works. I would be very grateful if anyone had any
suggestions based on their own knowledge and experience.

I have installed openblas and have it linked as my shared BLAS to R.

So I can do "export OPENBLAS_NUM_THREADS = x" and then when I start R,
I can run with x threads, no problem - the extra threads are visible
in htop and there is speed improvement for crossprod etc.

However, for some operations I find that the increased threads have a
negative effect on my code.

Therefore I would like to be able to change the number of threads from
within the R session - the idea being that I can parcel certain
functions along with an appropriate thread setting.

However, e.g. Sys.setenv() does not work, probably not least because
Sys.getenv() does not include the relevant variable(s) e.g.
OPENBLAS_NUM_THREADS.

I have also tried running a C function with getenv / setenv. Here,
getenv can access the value of OPENBLAS_NUM_THREADS set before R was
called, but setenv's changes have no effect on the operation of the
BLAS, and they are lost when R is closed. I thought this might have to
do with the way UNIX's child processes receive only copies of the
parent's variables, and that the BLAS must be called at a level where
the parent's variables inform the operation. However I do not know
enough about Unix and the way that R works on it to know where to go
from here.

I can see a few possibilities, all or none of which might work.

1) A different install of R explicitly using openblas.

2) The number of threads might be manipulable through another value or
function (e.g. another environment variable - although I could not
find a likely candidate.) There could be something elemental here I am
missing.

3) Use some kind of hack to send system() calls from a C function -
but this seems messy if at all possible.

4) Start R with different options.

5) There is an openblas_set_numthreads function in the source for
openblas, but it was not clear to me how or indeed whether this can be
used to alter the threads, and I could find no instances of people
using it or similar functions. If anyone has managed to link some the
openblas code into a project in a way to implement this function I
would be very glad to hear about it.

My apologies if this question has been covered before.

Any help is much appreciated.

Best wishes,

Simon

Trying to change OPENBLAS_NUM_THREADS from within R

Thread (3 messages)