Skip to content

[R-pkg-devel] Too many cores used in examples (not caused by data.table)

16 messages · Dirk Eddelbuettel, Ivan Krylov, Greg Hunt +3 more

#
Hi,

I am having difficulties in getting the latest version of the bssm (https://github.com/helske/bssm) package to CRAN, as the pretest issues a NOTE that the package uses too many cores in some of the examples ("Examples with CPU time > 2.5 times elapsed time"). I've seen plenty of discussion about this issue in relation to the data.table package, but bssm does not use it. Also, while bssm uses OpenMP in some functions, these are not called in the example in question (?exchange), and by default the number of threads in the parallelisable functions is set to 1.

But I just realised that bssm uses Armadillo via RcppArmadillo, which uses OpenMP by default for some elementwise operations. So, I wonder if that could be the culprit? However, I would think that in such case there would be many other packages with RcppArmadillo encountering the same CRAN issues. Has anyone experienced this with their packages which use RcppArmadillo but not data.table, or can say whether my guess is correct? I haven't been able to reproduce the issue myself on r-hub or my own linux, so I can't really test whether setting #define ARMA_DONT_USE_OPENMP helps.

Best,
Jouni
1 day later
#
On 19 October 2023 at 05:57, Helske, Jouni wrote:
| I am having difficulties in getting the latest version of the bssm (https://github.com/helske/bssm) package to CRAN, as the pretest issues a NOTE that the package uses too many cores in some of the examples ("Examples with CPU time > 2.5 times elapsed time"). I've seen plenty of discussion about this issue in relation to the data.table package, but bssm does not use it. Also, while bssm uses OpenMP in some functions, these are not called in the example in question (?exchange), and by default the number of threads in the parallelisable functions is set to 1.
| 
| But I just realised that bssm uses Armadillo via RcppArmadillo, which uses OpenMP by default for some elementwise operations. So, I wonder if that could be the culprit? However, I would think that in such case there would be many other packages with RcppArmadillo encountering the same CRAN issues. Has anyone experienced this with their packages which use RcppArmadillo but not data.table, or can say whether my guess is correct? I haven't been able to reproduce the issue myself on r-hub or my own linux, so I can't really test whether setting #define ARMA_DONT_USE_OPENMP helps.

You have some options to control OpenMP.

There is an environment variable (OMP_THREAD_LIMIT), and there is an CRAN
add-on package (RhpcBLASctl) which, if memory serves, also sets this. Looking
at the Armadillo documentation we see another variable (ARMA_OPENMP_THREADS).

I really think CRAN made a mistake here pushing this down on all package
maintainers.  It is too much work, some will get frustrated, some will get it
wrong and I fear in aggregate we end up with less performant software (as
some will 'cave in' and hard-wire single threaded computes). 

Dirk
#
? Thu, 19 Oct 2023 05:57:54 +0000
"Helske, Jouni" <jouni.helske at jyu.fi> ?????:
I wasn't able to reproduce the NOTE either, despite manually setting
the environment variable
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
check, but I think I can see the code using OpenMP. Here's what I did:

0. Temporarily lower the system protections against capturing
performance traces of potentially sensitive parts:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

(Set it back to 3 after you're done.)

1. Run the following command with the development version of the
package installed:

env OPENBLAS_NUM_THREADS=1 \
 perf record --call-graph drawf,4096 \
 R -e 'library(bssm); system.time(replicate(100, example(exchange)))'

OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
threads if you have it installed. (A different BLAS may need different
environment variables.)

2. Run `perf report` and browse collected call stack information.

The call stacks are hard to navigate, but I think they are not pointing
towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
help, but setting OMP_THREAD_LIMIT=1 does.
3 days later
#
Thanks for the help, I now tried resubmitting with Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I still get the same note:

Examples with CPU time > 2.5 times elapsed time
          user system elapsed ratio
exchange 1.196   0.04   0.159 7.774

Not sure what to try next.

Best,
Jouni
#
In my case recently, after an hour or so?s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.
On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni <jouni.helske at jyu.fi> wrote:

            

  
  
#
? Tue, 24 Oct 2023 10:37:48 +0000
"Helske, Jouni" <jouni.helske at jyu.fi> ?????:
I've downloaded the archived copy of the package from the CRAN FTP
server, installed it and tried:

library(bssm)
Sys.setenv("OMP_THREAD_LIMIT" = 2)
data("exchange")
model <- svm(
 exchange, rho = uniform(0.97,-0.999,0.999),
 sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
)
system.time(particle_smoother(model, particles = 500))
#    user  system elapsed
#   0.515   0.000   0.073

I set a breakpoint on clone() [*] and got quite a few calls creating
OpenMP threads with the following call stack:

#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
<...>
#4  0x00007ffff7314e0a in GOMP_parallel () from
/usr/lib/x86_64-linux-gnu/libgomp.so.1
 <-- RcppArmadillo code below
#5 0x00007ffff38f5f00 in
arma::eglue_core<arma::eglue_div>::apply<arma::Mat<double>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>,
arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
.../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
#6 0x00007ffff3a31246 in
arma::Mat<double>::operator=<arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> (X=..., this=0x7fffffff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
#7
arma::Col<double>::operator=<arma::eGlue<arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fffffff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
 <-- bssm code below
#8  ssm_ung::laplace_iter (this=0x7fffffff15e0, signal=...) at
model_ssm_ung.cpp:310
#9  0x00007ffff3a36e9e in ssm_ung::approximate (this=0x7fffffff15e0) at
.../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
#10 0x00007ffff3a3b3d3 in ssm_ung::psi_filter
(this=this at entry=0x7fffffff15e0, nsim=nsim at entry=500, alpha=...,
weights=..., indices=...) at model_ssm_ung.cpp:517
#11 0x00007ffff3948cd7 in psi_smoother (model_=..., nsim=nsim at entry=500,
seed=seed at entry=1092825895, model_type=model_type at entry=3) at
R_psi.cpp:131

What does arma::eglue_core do?

(gdb) list
/* reformatted a bit */
library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
 int n_threads = (std::min)(
  int(arma_config::mp_threads),
  int((std::max)(int(1), int(omp_get_max_threads())))
 );
(gdb) p arma_config::mp_threads
$3 = 8
(gdb) p (int)omp_get_max_threads()
$4 = 16
(gdb) p (char*)getenv("OMP_THREAD_LIMIT")
$7 = 0x555556576b91 "2"
(gdb) p /x (int)omp_get_thread_limit()
$9 = 0x7fffffff

Sorry for misinforming you about the OMP_THREAD_LIMIT environment
variable: the OpenMP specification requires the program to ignore
modifications to the environment variables after the program has
started [**], so it only works if R is started with OMP_THREAD_LIMIT
set. Additionally, the OpenMP thread limit is not supposed to be
adjusted at runtime at all [***].

Unfortunately for our situation, Armadillo is very insistent in setting
its own number of threads from arma_config::mp_threads (which is
constexpr 8 unless you set preprocessor directives while compiling it)
and omp_get_max_threads (which is the upper bound on the number of
threads that cannot be adjusted at runtime).

What I'm about to suggest is a terrible hack, but since Armadillo seems
to lack the option to set the number of threads at runtime, there might
be no other option.

Before you #include an Armadillo header, every time:

1. #include <omp.h> so that the OpenMP functions are declared and the
#include guard is set

2. Define a static inline function get_number_of_threads returning the
desired number of threads as an int (e.g. referencing an extern int
number_of_threads stored elsewhere)

3. #define omp_get_max_threads get_number_of_threads

Now if you provide an API for the R code to get and set this number, it
should be possible to control the number of threads used by OpenMP code
in Armadillo. Basically, a data.table::setDTthreads() for the copy of
Armadillo inlined inside your package.

If you then compile your package with a large #define
ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
*and* limit itself when needed.

An alternative course of action is compiling your package with #define
ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
to Armadillo.
#
Chapter 15 in Wickham and Bryan, R Packages, discuss "Advanced 
Testing Techniques". Their current section "15.4.1 Skip a test" includes 
the following:


test_that("some long-running thing works", {
   skip_on_cran()
   # test code that can potentially take "a while" to run
})


	  Have you tried writing directly to Jennifer Bryan 
<jenny at rstudio.com>? She and Hadley might be able to get help from the 
CRAN maintainers in getting help with this particular problem AND 
getting more documentation on this in their book ;-)


	  hope this helps.
	  spencer graves
On 10/24/23 6:03 AM, Greg Hunt wrote:
#
On 24 October 2023 at 15:55, Ivan Krylov wrote:
| ? Tue, 24 Oct 2023 10:37:48 +0000
| "Helske, Jouni" <jouni.helske at jyu.fi> ?????:
| 
| > Examples with CPU time > 2.5 times elapsed time
| >           user system elapsed ratio
| > exchange 1.196   0.04   0.159 7.774
| 
| I've downloaded the archived copy of the package from the CRAN FTP
| server, installed it and tried:
| 
| library(bssm)
| Sys.setenv("OMP_THREAD_LIMIT" = 2)
| data("exchange")
| model <- svm(
|  exchange, rho = uniform(0.97,-0.999,0.999),
|  sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
| )
| system.time(particle_smoother(model, particles = 500))
| #    user  system elapsed
| #   0.515   0.000   0.073
| 
| I set a breakpoint on clone() [*] and got quite a few calls creating
| OpenMP threads with the following call stack:
| 
| #0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
| <...>
| #4  0x00007ffff7314e0a in GOMP_parallel () from
| /usr/lib/x86_64-linux-gnu/libgomp.so.1
|  <-- RcppArmadillo code below
| #5 0x00007ffff38f5f00 in
| arma::eglue_core<arma::eglue_div>::apply<arma::Mat<double>,
| arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>,
| arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>,
| arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
| .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
| #6 0x00007ffff3a31246 in
| arma::Mat<double>::operator=<arma::eOp<arma::eOp<arma::Col<double>,
| arma::eop_exp>, arma::eop_scalar_times>,
| arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
| arma::eop_square>, arma::eglue_div> (X=..., this=0x7fffffff36f0) at
| .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
| #7
| arma::Col<double>::operator=<arma::eGlue<arma::eOp<arma::eOp<arma::Col<double>,
| arma::eop_exp>, arma::eop_scalar_times>,
| arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
| arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fffffff36f0) at
| .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
|  <-- bssm code below
| #8  ssm_ung::laplace_iter (this=0x7fffffff15e0, signal=...) at
| model_ssm_ung.cpp:310
| #9  0x00007ffff3a36e9e in ssm_ung::approximate (this=0x7fffffff15e0) at
| .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
| #10 0x00007ffff3a3b3d3 in ssm_ung::psi_filter
| (this=this at entry=0x7fffffff15e0, nsim=nsim at entry=500, alpha=...,
| weights=..., indices=...) at model_ssm_ung.cpp:517
| #11 0x00007ffff3948cd7 in psi_smoother (model_=..., nsim=nsim at entry=500,
| seed=seed at entry=1092825895, model_type=model_type at entry=3) at
| R_psi.cpp:131
| 
| What does arma::eglue_core do?
| 
| (gdb) list
| /* reformatted a bit */
| library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
|  int n_threads = (std::min)(
|   int(arma_config::mp_threads),
|   int((std::max)(int(1), int(omp_get_max_threads())))
|  );
| (gdb) p arma_config::mp_threads
| $3 = 8
| (gdb) p (int)omp_get_max_threads()
| $4 = 16
| (gdb) p (char*)getenv("OMP_THREAD_LIMIT")
| $7 = 0x555556576b91 "2"
| (gdb) p /x (int)omp_get_thread_limit()
| $9 = 0x7fffffff
| 
| Sorry for misinforming you about the OMP_THREAD_LIMIT environment
| variable: the OpenMP specification requires the program to ignore
| modifications to the environment variables after the program has
| started [**], so it only works if R is started with OMP_THREAD_LIMIT
| set. Additionally, the OpenMP thread limit is not supposed to be
| adjusted at runtime at all [***].
| 
| Unfortunately for our situation, Armadillo is very insistent in setting
| its own number of threads from arma_config::mp_threads (which is
| constexpr 8 unless you set preprocessor directives while compiling it)
| and omp_get_max_threads (which is the upper bound on the number of
| threads that cannot be adjusted at runtime).
| 
| What I'm about to suggest is a terrible hack, but since Armadillo seems
| to lack the option to set the number of threads at runtime, there might
| be no other option.
| 
| Before you #include an Armadillo header, every time:
| 
| 1. #include <omp.h> so that the OpenMP functions are declared and the
| #include guard is set
| 
| 2. Define a static inline function get_number_of_threads returning the
| desired number of threads as an int (e.g. referencing an extern int
| number_of_threads stored elsewhere)
| 
| 3. #define omp_get_max_threads get_number_of_threads
| 
| Now if you provide an API for the R code to get and set this number, it
| should be possible to control the number of threads used by OpenMP code
| in Armadillo. Basically, a data.table::setDTthreads() for the copy of
| Armadillo inlined inside your package.
| 
| If you then compile your package with a large #define
| ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
| *and* limit itself when needed.
| 
| An alternative course of action is compiling your package with #define
| ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
| to Armadillo.

We should work on adding such a run-time setter of the number of cores to
RcppArmadillo so that examples can dial down to 2 cores.  I have been doing
just that in package tiledb (via a setting internal to the TileDB Core
library) for 'ages' now and RcppArmadillo could and should offer the same.

Dirk

| -- 
| Best regards,
| Ivan
| 
| [*]
| https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814
| 
| [**]
| https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-59000021
| 
| [***]
| https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15
| 
| ______________________________________________
| R-package-devel at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-package-devel
#
You are not the only one; I did the same with some of my examples.

Would it be an option to ask for a default R-option, 'max.ncores', that 
specifies the maximum number of cores a process is allowed to use? CRAN 
could then require that that examples, tests and vignettes respect this 
option. That way there would be one uniform option to specify the 
maximum number of cores processes could use. That would also make it 
easier for system administrators to set default values for this (use the 
entire system; or use one code by default on a shared system).

Of course, we package maintainers could do this without involvement of 
R-code or CRAN. We only need to agree on a name and a default value for 
when the option is missing (0 = use all cores; 1 or 2; or ncores-1 ...).

Jan
On 24-10-2023 13:03, Greg Hunt wrote:
#
On 24 October 2023 at 08:15, Dirk Eddelbuettel wrote:
|
| On 24 October 2023 at 15:55, Ivan Krylov wrote:
| | ? Tue, 24 Oct 2023 10:37:48 +0000
| | "Helske, Jouni" <jouni.helske at jyu.fi> ?????:
| | 
| | > Examples with CPU time > 2.5 times elapsed time
| | >           user system elapsed ratio
| | > exchange 1.196   0.04   0.159 7.774
| | 
| | I've downloaded the archived copy of the package from the CRAN FTP
| | server, installed it and tried:
| | 
| | library(bssm)
| | Sys.setenv("OMP_THREAD_LIMIT" = 2)
| | data("exchange")
| | model <- svm(
| |  exchange, rho = uniform(0.97,-0.999,0.999),
| |  sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
| | )
| | system.time(particle_smoother(model, particles = 500))
| | #    user  system elapsed
| | #   0.515   0.000   0.073
| | 
| | I set a breakpoint on clone() [*] and got quite a few calls creating
| | OpenMP threads with the following call stack:
| | 
| | #0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
| | <...>
| | #4  0x00007ffff7314e0a in GOMP_parallel () from
| | /usr/lib/x86_64-linux-gnu/libgomp.so.1
| |  <-- RcppArmadillo code below
| | #5 0x00007ffff38f5f00 in
| | arma::eglue_core<arma::eglue_div>::apply<arma::Mat<double>,
| | arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>,
| | arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>,
| | arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
| | .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
| | #6 0x00007ffff3a31246 in
| | arma::Mat<double>::operator=<arma::eOp<arma::eOp<arma::Col<double>,
| | arma::eop_exp>, arma::eop_scalar_times>,
| | arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
| | arma::eop_square>, arma::eglue_div> (X=..., this=0x7fffffff36f0) at
| | .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
| | #7
| | arma::Col<double>::operator=<arma::eGlue<arma::eOp<arma::eOp<arma::Col<double>,
| | arma::eop_exp>, arma::eop_scalar_times>,
| | arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
| | arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fffffff36f0) at
| | .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
| |  <-- bssm code below
| | #8  ssm_ung::laplace_iter (this=0x7fffffff15e0, signal=...) at
| | model_ssm_ung.cpp:310
| | #9  0x00007ffff3a36e9e in ssm_ung::approximate (this=0x7fffffff15e0) at
| | .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
| | #10 0x00007ffff3a3b3d3 in ssm_ung::psi_filter
| | (this=this at entry=0x7fffffff15e0, nsim=nsim at entry=500, alpha=...,
| | weights=..., indices=...) at model_ssm_ung.cpp:517
| | #11 0x00007ffff3948cd7 in psi_smoother (model_=..., nsim=nsim at entry=500,
| | seed=seed at entry=1092825895, model_type=model_type at entry=3) at
| | R_psi.cpp:131
| | 
| | What does arma::eglue_core do?
| | 
| | (gdb) list
| | /* reformatted a bit */
| | library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
| |  int n_threads = (std::min)(
| |   int(arma_config::mp_threads),
| |   int((std::max)(int(1), int(omp_get_max_threads())))
| |  );
| | (gdb) p arma_config::mp_threads
| | $3 = 8
| | (gdb) p (int)omp_get_max_threads()
| | $4 = 16
| | (gdb) p (char*)getenv("OMP_THREAD_LIMIT")
| | $7 = 0x555556576b91 "2"
| | (gdb) p /x (int)omp_get_thread_limit()
| | $9 = 0x7fffffff
| | 
| | Sorry for misinforming you about the OMP_THREAD_LIMIT environment
| | variable: the OpenMP specification requires the program to ignore
| | modifications to the environment variables after the program has
| | started [**], so it only works if R is started with OMP_THREAD_LIMIT
| | set. Additionally, the OpenMP thread limit is not supposed to be
| | adjusted at runtime at all [***].
| | 
| | Unfortunately for our situation, Armadillo is very insistent in setting
| | its own number of threads from arma_config::mp_threads (which is
| | constexpr 8 unless you set preprocessor directives while compiling it)
| | and omp_get_max_threads (which is the upper bound on the number of
| | threads that cannot be adjusted at runtime).
| | 
| | What I'm about to suggest is a terrible hack, but since Armadillo seems
| | to lack the option to set the number of threads at runtime, there might
| | be no other option.
| | 
| | Before you #include an Armadillo header, every time:
| | 
| | 1. #include <omp.h> so that the OpenMP functions are declared and the
| | #include guard is set
| | 
| | 2. Define a static inline function get_number_of_threads returning the
| | desired number of threads as an int (e.g. referencing an extern int
| | number_of_threads stored elsewhere)
| | 
| | 3. #define omp_get_max_threads get_number_of_threads
| | 
| | Now if you provide an API for the R code to get and set this number, it
| | should be possible to control the number of threads used by OpenMP code
| | in Armadillo. Basically, a data.table::setDTthreads() for the copy of
| | Armadillo inlined inside your package.
| | 
| | If you then compile your package with a large #define
| | ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
| | *and* limit itself when needed.
| | 
| | An alternative course of action is compiling your package with #define
| | ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
| | to Armadillo.
| 
| We should work on adding such a run-time setter of the number of cores to
| RcppArmadillo so that examples can dial down to 2 cores.  I have been doing
| just that in package tiledb (via a setting internal to the TileDB Core
| library) for 'ages' now and RcppArmadillo could and should offer the same.

A run-time setter won't work, Armadillo uses a constexpr. But we may have
looked in error at the _wrong_ OpenMP environment variable.  Can you please
try with OMP_NUM_THREADS=2 (instead of OMP_THREAD_LIMIT) ?  

Dirk
 
| Dirk
| 
| | -- 
| | Best regards,
| | Ivan
| | 
| | [*]
| | https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814
| | 
| | [**]
| | https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-59000021
| | 
| | [***]
| | https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15
| | 
| | ______________________________________________
| | R-package-devel at r-project.org mailing list
| | https://stat.ethz.ch/mailman/listinfo/r-package-devel
| 
| -- 
| dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
| 
| ______________________________________________
| R-package-devel at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-package-devel
#
Thanks,

I tried with  OMP_NUM_THREADS=2 but unfortunately got the same NOTE as before.

Best,
Jouni
2 days later
#
Jouni,

My CRANberriesFeed reports a new bssm package at CRAN, congratulations for
sorting this out. [1,2] The OMP_NUM_THREADS setting is indeed all it takes,
and it _does_ seem to be read even from a running session: i.e. you can set
this inside an R session and the OpenMP code considers it in the same
process. Good!

As some of us mentioned, your usage pattern of setting
'Sys.setenv("OMP_NUM_THREADS" = 2)' everywhere _leaves_ that value so you
permanently ham-string the behaviour of a session which runs an example or
test of your package: the same session will never get back to 'all cores' by
itself so adding a resetter to the initial value (maybe via on.exit()) may be
a good idea for the next package revision if you have any energy left for
this question :)

Again, congrats for sorting it out, and sorry for the trouble. I long argued
CRAN should set the behaviour-defining environment variable, that
OMP_NUM_THREADS, for the tests and examples it wants to run under reduced
load.  Alas, that's not where we ended up.

Cheers,  Dirk

[1] http://dirk.eddelbuettel.com/cranberries/2023/10/27#bssm_2.0.2

[2] Your NEWS file calls this 'fix weird CRAN issues with parallelisation on
Debian.'. There is nothing 'weird' here (it behaves as designed, computers do
that to us), and it is not just on Debian but on any system where the build
has a) access to OpenMP so uses it and b) measures real time to elapsed time
with a cap of 2 as CRAN does.
#
Hi Dirk,

Actually, the OMP_NUM_THREADS worked for vignettes and testthat tests, but didn't help with the examples. However, I just wrapped the problematic example with \donttest as for some reason this issue only happened with a single seemingly simple example (hence the "weird" in the earlier NEWS due to frustration, I changed this to the CRAN version).

Thanks for reminding me about the resetting the number of cores, will fix that to the next version.

Best,
Jouni
#
Hi Jouni,
On 27 October 2023 at 13:02, Helske, Jouni wrote:
| Actually, the OMP_NUM_THREADS worked for vignettes and testthat tests, but
| didn't help with the examples. However, I just wrapped the problematic example

Now I am confused.

What is your understanding of why it helps in one place and not the other?

| with \donttest as for some reason this issue only happened with a single
| seemingly simple example (hence the "weird" in the earlier NEWS due to
| frustration, I changed this to the CRAN version).
| 
| Thanks for reminding me about the resetting the number of cores, will fix that
| to the next version.

I have an idea for a RcppArmadillo-based helper function. We can save the
initial values of the environment variable in .onLoad and cache it. A simple
helper function pair can then dial the environment variable down and reset it
to the cached value.

Dirk
2 days later
#
Hi Dirk,

Looking more closely earlier failures, vignettes have always worked fine but the note on tests said that the CPU time was only 2.7 times over elapsed time, so maybe I was just lucky this time and got under 2.5. ;) Or testthat does something special...

Jouni
#
I have some better news.  While we established that 'in theory' setting the
environment variable OMP_NUM_THREADS would help (and I maintain that it is a
great PITA that CRAN does not do so as a general fix for this issue) it does
*not help* once R is started.  OpenMP only considers the variable once at
startup and does not re-read it.  So we cannot set from R once R has started.

But OpenMP offers a setter (and a getter) for the thread count value.

And using it addresses the issue.  I created a demo package [1] which, when
running on a system with both OpenMP and 'enough cores' (any modern machine
will do) exhibits the warning from R CMD check --as-cran with timing enabled
(i.e. env vars set).  When an additional environment variable 'SHOWME' is set
to 'yes', it successfully throttles via the exposed OpenMP setter.  In our
example, Armadillo uses it to calibrate its thread use, a lower setting is
followed, and the warning is gone.

I will add more convenient wrappers to RcppArmadillo itself. These are
currently in a branch [2] and their use is illustrated in the help page and
example of fastLm demo function [3].  I plan to make a new RcppArmadillo
release with this change in the coming days, the setter and re-setter will
work for any OpenMP threading changes. So if you use RcppArmadillo, this
should help. (And of course there always was RhpcBLASctl doing this too.)

Dirk

[1] https://github.com/eddelbuettel/rcpparmadilloopenmpex
[2] https://github.com/RcppCore/RcppArmadillo/tree/feature/thread_throttle\
[3] https://github.com/RcppCore/RcppArmadillo/blob/a8db424bd6aaeda2ceb897142d3c366d9c6591c7/man/fastLm.Rd#L72-L98
[4] https://cran.r-project.org/package=RhpcBLASctl