Skip to content

[R-pkg-devel] Trouble with long-running tests on CRAN debian server

16 messages · Ivan Krylov, Mark Padgham, Scott Ritchie +4 more

#
Dear all,

I'm currently having issues submitting an update to the ukbnmr package to
CRAN as the checks on their debian server are generating a NOTE due to long
running tests.

The offending test code is much faster on my own machine, and on all other
CRAN servers (running in < 1 second), and I tried flagging this as a
false-positive NOTE to CRAN without success.

I've tried cutting down the test dataset used, but this issue has
persisted, suggesting there is actually some sort of issue here that's only
cropping up on some architectures (although my package has no compiled
code).

Has anyone else had this issue before with the CRAN debian server, and have
suggestions on how to proceed with debugging?

If useful, the offending test code can be run with:

```
remotes::install_github("sritchie73/ukbnmr")
library(ukbnmr)
system.time({ remove_technical_variation(test_data) })
```

Kind Regards,

Scott Ritchie
#
On Mon, 21 Aug 2023 12:02:55 +0100
Scott Ritchie <sritchie73 at gmail.com> wrote:

            
data.tables, you say? Can you show us the NOTE message you're getting?
It could be that your example takes too much CPU time (as opposed to
"real", "wallclock" time) due to running too many threads started by
data.table.

It's not obvious why data.table would start too many threads (it's
supposed to honour the limits that CRAN expresses in environment
variables), but at least it should be easy to check and discount.
#
Hi Ivan,

Here is the NOTE generated by CRAN:

* checking examples ... [5s/2s] NOTE
Examples with CPU time > 2.5 times elapsed time
                            user system elapsed ratio
remove_technical_variation 2.603  0.027    0.94 2.798

This doesn't appear to be related to data.table threads, here is what
I see after explicitly setting setDTthreads(1)

On my own machine (OSX Monterey, arm64 M1 processor):
user  system elapsed
  0.460   0.004   0.466

And on my University's cluster (RHEL 7, intel xeon platinum 8276 CPU @ 2.2 GHz):
user  system elapsed
  1.108   0.020   1.130

Runtimes are similar on these two machines when using an older version
of ukbnmr that has a 5x- larger test dataset (50 rows instead of 10
rows).

Best,

Scott
On Mon, 21 Aug 2023 at 13:16, Ivan Krylov <krylov.r00t at gmail.com> wrote:

            

  
  
#
On 21 August 2023 at 15:16, Ivan Krylov wrote:
| On Mon, 21 Aug 2023 12:02:55 +0100
| Scott Ritchie <sritchie73 at gmail.com> wrote:
| 
| > remotes::install_github("sritchie73/ukbnmr")
| > library(ukbnmr)
| > system.time({ remove_technical_variation(test_data) })
| 
| data.tables, you say? Can you show us the NOTE message you're getting?
| It could be that your example takes too much CPU time (as opposed to
| "real", "wallclock" time) due to running too many threads started by
| data.table.

Yep, and that is a new test AFAIK.
 
| It's not obvious why data.table would start too many threads (it's
| supposed to honour the limits that CRAN expresses in environment
| variables), but at least it should be easy to check and discount.

It grabs all it can get which is what you want for performance (I am on a
six-core machine here):

  $ R -q
  > library(data.table)
  data.table 1.14.8 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
  > 

and it honors variables if set

  $ OMP_THREAD_LIMIT=2 R -q
  > library(data.table)
  data.table 1.14.8 using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com
  > 

so I presume that variable is NOT set by CRAN.  It might help if it were.

Dirk
#
On 21/08/2023 14:34, Dirk Eddelbuettel wrote:
I had to update a package recently to get around this by putting
explicit 'data.table::setDTthreads(1)' in all examples, tests, and
vignettes. The incoming checks now do these CPU/elapsed tests, so you
can test by submitting, and if you're still over the ratio it will
auto-reject and tell you there. That was the only way to get my
submission to pass incoming.
#
On Mon, 21 Aug 2023 13:28:54 +0100
Scott Ritchie <sritchie73 at gmail.com> wrote:

            
In this context, "user" means the time spent executing userspace code
(as opposed to work done on behalf of the process by the operating
system kernel, "system"), and "elapsed" is the real time. Some threads
or child processes are definitely at work here.

Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2
set on the CRAN check machine. Either that, or place the responsibility
on data.table for setting the right number of threads by default. But
that's a policy question: should a CRAN package start no more than two
threads/child processes even if it doesn't know it's running in an
environment where the CPU time / elapsed time limit is two?
#
On 21 August 2023 at 16:05, Ivan Krylov wrote:
| Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2
| set on the CRAN check machine. Either that, or place the responsibility
| on data.table for setting the right number of threads by default. But
| that's a policy question: should a CRAN package start no more than two
| threads/child processes even if it doesn't know it's running in an
| environment where the CPU time / elapsed time limit is two?

Methinks that given this language in the CRAN Repository Policy

  If running a package uses multiple threads/cores it must never use more
  than two simultaneously: the check farm is a shared resource and will
  typically be running many checks simultaneously.

it would indeed be nice if this variable, and/or equivalent ones, were set.

As I mentioned before, I had long added a similar throttle (not for
data.table) in a package I look after (for work, even). So a similar
throttler with optionality is below. I'll add this to my `dang` package
collecting various functions.

A usage example follows. It does nothing by default, ensuring 'full power'
but reflects the minimum of two possible options, or an explicit count:

    > dang::limitDataTableCores(verbose=TRUE)
    Limiting data.table to '12'.
    > Sys.setenv("OMP_THREAD_LIMIT"=3); dang::limitDataTableCores(verbose=TRUE)
    Limiting data.table to '3'.
    > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
    Limiting data.table to '2'.
    > dang::limitDataTableCores(1, verbose=TRUE)
    Limiting data.table to '1'.
    >

That makes it, in my eyes, preferable to any unconditional 'always pick 1 thread'.

Dirk


##' Set threads for data.table respecting possible local settings
##'
##' This function set the number of threads \pkg{data.table} will use
##' while reflecting two possible machine-specific settings from the
##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
##' @title Set data.table threads respecting default settingss
##' @param ncores A numeric or character variable with the desired
##' count of threads to use
##' @param verbose A logical value with a default of \sQuote{FALSE} to
##' operate more verbosely
##' @return The return value of the \pkg{data.table} function
##' \code{setDTthreads} which is called as a side-effect.
##' @author Dirk Eddelbuettel
##' @export
limitDataTableCores <- function(ncores, verbose = FALSE) {
    if (missing(ncores)) {
        ## start with a simple fallback: 'Ncpus' (if set) or else 2
        ncores <- getOption("Ncpus", 2L)
        ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets NA if envvar unset
        ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
        ## and then keep the smaller
        ncores <- min(na.omit(c(ncores, ompcores)))
    }
    stopifnot("Package 'data.table' must be installed." = requireNamespace("data.table", quietly=TRUE))
    stopifnot("Argument 'ncores' must be numeric or character" = is.numeric(ncores) || is.character(ncores))
    if (verbose) message("Limiting data.table to '", ncores, "'.")
    data.table::setDTthreads(ncores)
}

| 
| -- 
| Best regards,
| Ivan
| 
| ______________________________________________
| R-package-devel at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-package-devel
#
Thanks Dirk and Ivan,

I took a slightly different work-around of forcing the number of threads to
1 when running functions of the test dataset in the package, by adding the
following to each user facing function:

```
  # Check if running on package test_data, and if so, force data.table to
be
  # single threaded so that we can avoid a NOTE on CRAN submission
  if (isTRUE(all.equal(x, ukbnmr::test_data))) {
    registered_threads <- getDTthreads()
    setDTthreads(1)
    on.exit({ setDTthreads(registered_threads) }) # re-register so no
unintended side effects for users
  }
```
(i.e. here x is the input argument to the function)

It took some trial and error to get to pass the CRAN tests; the number of
columns in the input data was also contributing to the problem.

Best,

Scott
On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
If you add that to each exported function, isn't that a lot of code to read + maintain?
Also, it seems like unnecessary computational overhead.
Regards,
Berry
1 day later
#
I (any many collegues here) have been caught several times by the 
following example:

1. did something in parallel on a cluster, set up via 
parallel::makeCluster().
2. e.g. allocated 20 cores and got them on one single machine
3. ran some code in parallel via parLapply()

Bang! 400 threads;
So I have started 20 parallel processes, each of which is using the 
automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted 
by the cluster to 20 (rather than 1).

Hence, I really believe a default should always be small, not only in 
examples and tests, but generally. And people who aim for more should be 
able to increase the defaults.

Do you believe a software that auto-occupies a 96 core machines with 96 
threads by default is sensible?

Best,
Uwe Ligges
On 21.08.2023 21:59, Berry Boessenkool wrote:
#
To whom are you addressing this question? The OpenMP developers who define the missing-OMP_THREAD_LIMIT behaviour and-or supply default config files? The CRAN server administrators who set the variable in their site-wide configuration intentionally or unintentionally? Or the package authors expected to kludge in settings to override those defaults for CRAN testing while not overriding them in normal use?

I would vote for explicitly addressing this (rhetorical?) question to the CRAN server administrators...
On August 23, 2023 6:31:01 AM PDT, Uwe Ligges <ligges at statistik.tu-dortmund.de> wrote:

  
    
#
Hi Uwe,

I agree and have also been burnt myself by programs occupying the maximum
number of cores available.

My understanding is that in the absence of explicit parallelisation, use of
data.table in a package should not lead to this type of behaviour?

Best,

Scott

On Wed, 23 Aug 2023 at 14:30, Uwe Ligges <ligges at statistik.tu-dortmund.de>
wrote:

  
  
#
On 23.08.2023 15:58, Jeff Newmiller wrote:
Of course , the CRAN teams controls the env vars on the CRAN servers, 
but not on a server a user might use. And a user is typically unaware 
that a package uses multithreading.
R users are typically not developers with a lot of insight in computer 
science. Most R users I know would not even know how to set an env var.

So why do you ecxpect your users to set an appropriate OMP_THREAD_LIMIT? 
Particularly when they aim at parallelization, they have to set it to 1.
I advocate not only to limit the number of cores for CRAN but also (and 
inparticular)  the default! Something we cannot check easily.


An alternative would be to teach R to set OMP_THREAD_LIMIT=1 locally by 
default and a mechanism to change that for users.

Best,
Uwe Ligges
#
I think one should be very cautious about overriding "standard" mechanisms for controlling software infrastructure like OpenMP.  You risk making the task of navigating the already-complex task of configuring the software environment even more complex by increasing the number of places you have to look in to find out why the mechanism documented by OpenMP is having no effect.

It may be that R Core agrees with you and creates an R-specific setting to control this... but IMO it should be accompanied by warning messages to help people figure out why their real work is underperforming if they link with compiled code that is supposed to make use of threads.
On August 23, 2023 7:24:46 AM PDT, Uwe Ligges <ligges at statistik.tu-dortmund.de> wrote:

  
    
1 day later
#
On 23.08.2023 16:00, Scott Ritchie wrote:
Yes, that would be my hope, too.

Best,
Uwe Ligges
#
On 25 August 2023 at 15:37, Uwe Ligges wrote:
| 
|
| On 23.08.2023 16:00, Scott Ritchie wrote:
| > Hi Uwe,
| > 
| > I agree and have also been burnt myself?by programs occupying the 
| > maximum number of cores available.
| > 
| > My understanding is that in the absence of explicit parallelisation, use 
| > of data.table in a package should not lead to this type of behaviour?
| 
| Yes, that would be my hope, too.

No everybody involved with data.table thinks using 50% is already a
compromise giving up performance, see eg Jan's comment from yesterday (and
everything leading up to it):

   https://github.com/Rdatatable/data.table/issues/5658#issuecomment-1691831704

*You* have a local constraint (that is perfectly reasonable) as *you* run
multiple package tests. So *you* should set a low value for OMP_THREAD_LIMIT.

Many users spend top dollars to have access to high-powered machines for
high-powered analyses. They do want all cores.

There simply cannot be one setting that addresses all situations. Please set
a low limit as your local deployment requires it.

Dirk