Skip to content
Prev 1408 / 2152 Next

Trying to change OPENBLAS_NUM_THREADS from within R

Hello,

I posted a couple of weeks ago about trying to change
OPENBLAS_NUM_THREADS from within R. I was on holidays since and have
not made much progress.

To clarify my problem, I am aware that many people have had issues
combining implicit and explicit parallelization, and that there are
useful discussions on this issue already posted on the mailing list;
however I am experiencing performance problems with OPENBLAS for
functions that make no use of explicit parallelization.

The problem arises for me in cases where a function that uses BLAS is
called in an apply statement.

While any one such operation speeds up with an increase in
OPENBLAS_NUM_THREADS, this is not the case over all when e.g. under
lapply.

I have some basic results for a crossprod below - individual
operations improve with added threads, but under lapply it flattens
out, and htop shows a lot of red bars at work on the processors
(indicating heavy system usage?). With other more complex functions
the performance deteriorates further and it is positively undesirable
to use more than 1 thread for OPENBLAS, but the results below should
be sufficient to illustrate the issue.

It is because of this specific problem that I want to be able to
control the number of threads at runtime, and hence other previously
explained approaches are not applicable, at least as far as I can
tell. But any other suggestions on how to overcome the problem would
be very welcome.

I intend to test on different systems and BLAS, but I was wondering
whether anyone had encountered this kind of problem before and if
there is a workaround, whether it is OPENBLAS specific, or indeed
whether it is specific to my system or processor (i7-2630QM).

Thank you in advance. A basic illustration of the issue follows.

Simon

CS Dept
NUIM

func2 <- function(x){
   qq <- matrix(rnorm(250000),500,500)
   return( crossprod(x,qq) )
}

li<-list()
for(i in 1:500){
   li[[i]] <- matrix(rnorm(250000),500,500)
}

#OPENBLAS_NUM_THREADS=1
 microbenchmark(lapply(li,func2),times=10)
 Unit: seconds
              expr      min       lq   median       uq     max
1 lapply(li, func2) 23.04498 23.05062 23.07796 23.39843 32.7897

microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 22.87143 23.32093 23.45628 24.34411 26.13419


#OPENBLAS_NUM_THREADS=2

 microbenchmark(lapply(li,func2),times=10)
Unit: seconds
              expr      min       lq  median       uq      max
1 lapply(li, func2) 20.95075 22.29843 23.2581 23.71557 24.21765

## Clearly BLAS improves performance, but not with lapply

microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 12.23434 13.25925 14.12305 14.54331 19.47039



#OPENBLAS_NUM_THREADS=4

microbenchmark(lapply(li,func2),times=10)
Unit: seconds
              expr     min       lq   median       uq      max
1 lapply(li, func2) 19.0154 20.17587 22.27971 23.56876 24.40342


microbenchmark(crossprod(li[[1]],li[[2]]))
Unit: milliseconds
                        expr      min       lq   median       uq      max
1 crossprod(li[[1]], li[[2]]) 7.301697 8.105116 8.346089 8.670551 10.41987
On Mon, Jun 11, 2012 at 4:17 PM, Simon Fuller <simonfuller9 at gmail.com> wrote: