[R-pkg-devel] Trouble with long-running tests on CRAN debian server
On 23.08.2023 16:00, Scott Ritchie wrote:
Hi Uwe, I agree and have also been burnt myself?by programs occupying the maximum number of cores available. My understanding is that in the absence of explicit parallelisation, use of data.table in a package should not lead to this type of behaviour?
Yes, that would be my hope, too. Best, Uwe Ligges
Best,
Scott
On Wed, 23 Aug 2023 at 14:30, Uwe Ligges
<ligges at statistik.tu-dortmund.de
<mailto:ligges at statistik.tu-dortmund.de>> wrote:
I (any many collegues here) have been caught several times by the
following example:
1. did something in parallel on a cluster, set up via
parallel::makeCluster().
2. e.g. allocated 20 cores and got them on one single machine
3. ran some code in parallel via parLapply()
Bang! 400 threads;
So I have started 20 parallel processes, each of which is using the
automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted
by the cluster to 20 (rather than 1).
Hence, I really believe a default should always be small, not only in
examples and tests, but generally. And people who aim for more
should be
able to increase the defaults.
Do you believe a software that auto-occupies a 96 core machines with 96
threads by default is sensible?
Best,
Uwe Ligges
On 21.08.2023 21:59, Berry Boessenkool wrote:
>
> If you add that to each exported function, isn't that a lot of
code to read + maintain?
> Also, it seems like unnecessary computational overhead.
>? From a software design point of view, it might be nicer to set
that in the examples + tests.
>
> Regards,
> Berry
>
> ________________________________
> From: R-package-devel <r-package-devel-bounces at r-project.org
<mailto:r-package-devel-bounces at r-project.org>> on behalf of Scott
Ritchie <sritchie73 at gmail.com <mailto:sritchie73 at gmail.com>>
> Sent: Monday, August 21, 2023 19:23
> To: Dirk Eddelbuettel <edd at debian.org <mailto:edd at debian.org>>
> Cc: r-package-devel at r-project.org
<mailto:r-package-devel at r-project.org>
<r-package-devel at r-project.org <mailto:r-package-devel at r-project.org>>
> Subject: Re: [R-pkg-devel] Trouble with long-running tests on
CRAN debian server
>
> Thanks Dirk and Ivan,
>
> I took a slightly different work-around of forcing the number of
threads to
> 1 when running functions of the test dataset in the package, by
adding the
> following to each user facing function:
>
> ```
>? ? # Check if running on package test_data, and if so, force
data.table to
> be
>? ? # single threaded so that we can avoid a NOTE on CRAN submission
>? ? if (isTRUE(all.equal(x, ukbnmr::test_data))) {
>? ? ? registered_threads <- getDTthreads()
>? ? ? setDTthreads(1)
>? ? ? on.exit({ setDTthreads(registered_threads) }) # re-register
so no
> unintended side effects for users
>? ? }
> ```
> (i.e. here x is the input argument to the function)
>
> It took some trial and error to get to pass the CRAN tests; the
number of
> columns in the input data was also contributing to the problem.
>
> Best,
>
> Scott
>
>
> On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd at debian.org
<mailto:edd at debian.org>> wrote:
>
>>
>> On 21 August 2023 at 16:05, Ivan Krylov wrote:
>> | Dirk is probably right that it's a good idea to have
OMP_THREAD_LIMIT=2
>> | set on the CRAN check machine. Either that, or place the
responsibility
>> | on data.table for setting the right number of threads by
default. But
>> | that's a policy question: should a CRAN package start no more
than two
>> | threads/child processes even if it doesn't know it's running in an
>> | environment where the CPU time / elapsed time limit is two?
>>
>> Methinks that given this language in the CRAN Repository Policy
>>
>>? ? If running a package uses multiple threads/cores it must
never use more
>>? ? than two simultaneously: the check farm is a shared resource
and will
>>? ? typically be running many checks simultaneously.
>>
>> it would indeed be nice if this variable, and/or equivalent
ones, were set.
>>
>> As I mentioned before, I had long added a similar throttle (not for
>> data.table) in a package I look after (for work, even). So a similar
>> throttler with optionality is below. I'll add this to my `dang`
package
>> collecting various functions.
>>
>> A usage example follows. It does nothing by default, ensuring
'full power'
>> but reflects the minimum of two possible options, or an explicit
count:
>>
>>? ? ? > dang::limitDataTableCores(verbose=TRUE)
>>? ? ? Limiting data.table to '12'.
>>? ? ? > Sys.setenv("OMP_THREAD_LIMIT"=3);
>> dang::limitDataTableCores(verbose=TRUE)
>>? ? ? Limiting data.table to '3'.
>>? ? ? > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
>>? ? ? Limiting data.table to '2'.
>>? ? ? > dang::limitDataTableCores(1, verbose=TRUE)
>>? ? ? Limiting data.table to '1'.
>>? ? ? >
>>
>> That makes it, in my eyes, preferable to any unconditional
'always pick 1
>> thread'.
>>
>> Dirk
>>
>>
>> ##' Set threads for data.table respecting possible local settings
>> ##'
>> ##' This function set the number of threads \pkg{data.table}
will use
>> ##' while reflecting two possible machine-specific settings from the
>> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
>> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
>> ##' @title Set data.table threads respecting default settingss
>> ##' @param ncores A numeric or character variable with the desired
>> ##' count of threads to use
>> ##' @param verbose A logical value with a default of
\sQuote{FALSE} to
>> ##' operate more verbosely
>> ##' @return The return value of the \pkg{data.table} function
>> ##' \code{setDTthreads} which is called as a side-effect.
>> ##' @author Dirk Eddelbuettel
>> ##' @export
>> limitDataTableCores <- function(ncores, verbose = FALSE) {
>>? ? ? if (missing(ncores)) {
>>? ? ? ? ? ## start with a simple fallback: 'Ncpus' (if set) or else 2
>>? ? ? ? ? ncores <- getOption("Ncpus", 2L)
>>? ? ? ? ? ## also consider OMP_THREAD_LIMIT (cf Writing R
Extensions), gets
>> NA if envvar unset
>>? ? ? ? ? ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
>>? ? ? ? ? ## and then keep the smaller
>>? ? ? ? ? ncores <- min(na.omit(c(ncores, ompcores)))
>>? ? ? }
>>? ? ? stopifnot("Package 'data.table' must be installed." =
>> requireNamespace("data.table", quietly=TRUE))
>>? ? ? stopifnot("Argument 'ncores' must be numeric or character" =
>> is.numeric(ncores) || is.character(ncores))
>>? ? ? if (verbose) message("Limiting data.table to '", ncores, "'.")
>>? ? ? data.table::setDTthreads(ncores)
>> }
>>
>> |
>> | --
>> | Best regards,
>> | Ivan
>> |
>> | ______________________________________________
>> | R-package-devel at r-project.org
<mailto:R-package-devel at r-project.org> mailing list
>>
>> --
>> dirk.eddelbuettel.com <http://dirk.eddelbuettel.com> |
@eddelbuettel | edd at debian.org <mailto:edd at debian.org>
>>
>
>? ? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel at r-project.org
<mailto:R-package-devel at r-project.org> mailing list
>
>? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel at r-project.org
<mailto:R-package-devel at r-project.org> mailing list