Skip to content

[R-pkg-devel] R package submission - too many threads error

3 messages · Stephen Abrams, Ivan Krylov

#
Hi - my submission was rejected with the following error in one of my
vignettes.

On Debian GNU/Linux trixie/sid:

Error: processing vignette 'modeling_with_binary_classifiers.Rmd'
failed with diagnostics:
24 simultaneous processes spawned

On Windows:

Error: processing vignette 'modeling_with_binary_classifiers.Rmd'
failed with diagnostics:
72 simultaneous processes spawned

I encountered a similar error while using R CMD check --as-cran on my local
(windows) machine and solved it by using a suggestion from this thread:
https://stackoverflow.com/questions/50571325/r-cran-check-fail-when-using-parallel-functions

Specifically, checking for Sys.getenv("_R_CHECK_LIMIT_CORES_", "") and
forcibly bypassing the parallel processing capability in the package. Is
there a better way to do this? Should I just skip the vignette altogether
and try to resolve this after the package is accepted?

For a little more detail, the package (called spect) uses the caret package
under the hood and takes advantage of parallel processing if the user
specifies it. The package is located at the github repo below. The bypass
occurs at line 378 of spect.R:
https://github.com/dawdawdo/spect

A secondary worry is that even if I resolve this, there might be something
else causing threads to spin up. How can I test for that when the error
doesn't trigger when I run R CMD check? I don't want to waste shared
resources if I can check it myself first.

Any help would be greatly appreciated. Thanks!
#
Dear Stephen Abrams,

Welcome to R-package-devel!

? Thu, 13 Feb 2025 22:20:50 -0500
Stephen Abrams <stephen.abrams at gmail.com> ?????:
Instead of using detectCores() [*] and creating cluster objects
yourself, how about letting the user provide a cluster object for you
as a function argument? Yes, it takes slightly more typing for the user,
but on the other hand it lets the user:

 - choose the number of cores for themselves (currently the code seems
   to be ignoring the 'cores' argument)
 - distribute the computation over the network by connecting to the
   machines they know about
 - provide a completely custom, non-PSOCK cluster object that
   'parallel' will nevertheless will work with

Since you're already using doParallel, maybe the right choice is to let
the user call registerDoParallel()?

Determining the right amount of parallelism in your code is a
surprisingly hard problem. Especially on shared computers, a program
naively deciding to use all (or 3/4 of all, or 1/2 of all) processors
may end up working much worse than a purely sequential one [**].

While rendering the vignette in a CRAN package, create a two-process
cluster or set use_parallel = FALSE: CRAN needs the rest of the
processors to check other packages in parallel with yours [***].

Good luck!
#
I appreciate the welcome! Also - I believe that replying to an email is the
way to respond here, but please let me know if that's not the case.

In any event - passing in a cluster context is an interesting idea. I will
think about that. Also, it seems that despite me telling myself to write
bug-free code, you have correctly identified that I don't actually make use
of the passed cores parameter - oops! This is where it would have really
helped me to have a peer reviewer. Thanks!
On Fri, Feb 14, 2025 at 3:48?PM Ivan Krylov <ikrylov at disroot.org> wrote: