Skip to content

Another issue using multi-processing linear algebra libraries

2 messages · Dipterix Wang, Ivan Krylov

#
I also have this questions for this. I wonder if R initiates a system environment or options to instruct the packages on the number of cores to use?

It doesn't have to be mandatory for now, but at least package maintainers can have shared consensus and start to adopt this humble settings rather than abusing parallel::detectCores() to max out the number of threads by default.

  
  
#
? Wed, 7 Aug 2024 07:47:38 -0400
Dipterix Wang <dipterix.wang at gmail.com> ?????:
A lot of thought and experience with various HPC systems went into
availableCores(), a function from the zero-dependency 'parallelly'
package by Henrik Bengtsson:
https://search.r-project.org/CRAN/refmans/parallelly/html/availableCores.html
If you cannot accept a pre-created cluster object or 'future' plan or
'BiocParallel' parameters or the number of OpenMP threads from the
user, this must be a safer default than parallel::detectCores().

Building such a limiter into R poses a number of problems. Here is a
summary from a previous discussion on R-pkg-devel [1] with wise
contributions from Dirk Eddelbuettel, Reed A. Cartwright, Vladimir
Dergachev, and Andrew Robbins.

 - R is responsible for the BLAS it is linked to and therefore must
   actively manage the BLAS threads when the user sets the thread
   limit. This requires writing BLAS-specific code to talk to the
   libraries, like done in FlexiBLAS and the RhpcBLASctl package. Some
   BLASes (like ATLAS) only have a compile-time thread limit. R should
   somehow give all threads to BLAS by default but take them away when
   some other form of parallelism is requested.

 - Should R be managing the OpenMP thread limit by itself? If not,
   that's a lot of extra work for every OpenMP-using package developer.
   If yes, R is now responsible for initialising OpenMP.

 - Managing the BLAS and OpenMP thread limits is already a hard problem
   because some BLASes may or may not be following the OpenMP thread
   limits.

 - What if two packages both consult the thread limit and create N^2
   processes as a result of one calling the other? Dividing a single
   computer between BLAS threads, OpenMP threads, child processes and
   their threads needs a very reliable global inter-process semaphore.
   R would have to grow a jobserver like in GNU Make, a separate
   process because the main R thread will be blocked waiting for the
   computation result, especially if we want to automatically recover
   job slots from crashed processes. That's probably not impossible,
   but involves a lot of OS-specific code.

 - What happens with the thread limit when starting remote R processes?
   It's best to avoid having to set it manually. If multiple people
   unknowingly start R on a shared server, how to avoid the R instances
   competing for the CPU (or the ownership of the semaphore)?

 - It will take a lot of political power to actually make this scheme
   work. The limiter can only be cooperative (unless you override the
   clone() syscall and make it fail? I expect everything to crash after
   that), so it takes one piece of software to unknowingly ignore the
   limit and break everything.