[R-pkg-devel] multithreading in packages
On Sat, 9 Oct 2021, Ivan Krylov wrote:
? Thu, 7 Oct 2021 21:58:08 -0400 (EDT) Vladimir Dergachev <volodya at mindspring.com> ?????:
* My understanding from reading documentation and source code is that there is no dedicated support in R yet, but there are packages that use multithreading. Are there any plans for multithreading support in future R versions ?
Shared memory multithreading is hard to get right in a memory-safe language (e.g. R), but there's the parallel package, which is a part of base R, which offers process-based parallelism and may run your code on multiple machines at the same time. There's no communication _between_ these machines, though. (But I think there's an MPI package on CRAN.)
Well, the way I planned to use multitheading is to speedup processing of very large vectors, so one does not have to wait seconds for the command to return. Same could be done for many built-in R primitives.
* pthread or openmp ? I am particularly concerned about interaction with other packages. I have seen that using pthread and openmp libraries simultaneously can result in incorrectly pinned threads.
pthreads-based code could be harder to run on Windows (which is a first-class platform for R, expected to be supported by most packages).
G?bor Cs?rdi pointed out that R is compiled with mingw on Windows and has pthread support - something I did not know either.
OpenMP should be cross-platform, but Apple compilers are sometimes lacking; the latest Apple likely has been solved since I've heard about it. If your problem can be made embarrassingly parallel, you're welcome to use the parallel package.
I used parallel before, it is very nice, but R-level only. I am looking for something to speedup response of individual package functions so they themselves can be used of part of more complicated code.
* control of maximum number of threads. One can default to openmp environment variable, but these might vary between openmp implementations.
Moreover, CRAN-facing tests aren't allowed to consume more than 200% CPU, so it's a good idea to leave the number of workers in control of the user. According to a reference guide I got from openmp.org, OpenMP implementations are expected to understand omp_set_num_threads() and the OMP_NUM_THREADS environment variable.
Oh, this would never be run through CRAN tests, it is meant for data that is too big for CRAN. I seem to remember that the Intel compiler used a different environmental variable, but it could be this was fixed since the last time I used it. best Vladimir Dergachev
-- Best regards, Ivan