Hi Ivan, Thank you for investigating this so thoroughly! My apologies for not replying sooner. I think it is a good practice to assume up to 2 cores for most users. Unfortunately I am unable to trigger this NOTE locally, even running: ``` _R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=1.1 \ R CMD check --timings greta_0.5.1.tar.gz ``` I have tried setting default chains and cpu to be 2, but it seems, based on your evidence perhaps that there is something inside tensorflow, or in reticulate that might be triggering this. I will reach out to the developers of tensorflow and reticulate to query this. Thank you so much again for your help! Nick
On Fri, 30 Jan 2026 at 01:26, Ivan Krylov <ikrylov at disroot.org> wrote:
? Thu, 29 Jan 2026 11:36:00 +1100 Nicholas Tierney <nicholas.tierney at gmail.com> ?????:
I also note that I cannot reproduce the same error with:
```
rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran",
"--no-multiarch"), env =
"_R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ = 53")
```
This threshold is very large. It makes more sense to lower it, not
raise, in order to make the test more sensitive. On a machine with 16
hyper-threads,
_R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=1.1 \
R CMD check --timings greta_0.5.1.tar.gz
...reproduces the NOTE. (The more CPUs on your machine, the more the
chance that something trying to use all of them will cause a noticeable
spike in CPU time. The machines running CRAN checks have even more
logical CPUs.) In particular,
list.files("vignettes", full.names=TRUE, patt = '[.]Rmd$') |>
setNames(nm = _) |> lapply(\(.)
tools::buildVignette(., encoding='UTF-8') |> system.time()
)
shows that it's the get_started.Rmd vignette (62.765 s userspace CPU
time vs. 27.745 s elapsed time).
What exactly starts these threads? 'pthread_create' is not the complete
answer, but it usually points out some suspects.
ltrace -w 10 -f -e pthread_create -- \
env R_LIBS_USER=greta.Rcheck:%U \
R -f greta.Rcheck/greta/doc/get_started.R
...There are a lot of suspects.
Here is a copy of OpenBLAS bundled with a Python wheel that 'greta'
installs starting 16 threads when greta:::check_tf_version is called:
knitr::opts_chunk$set(echo = TRUE,
+ eval = greta:::check_tf_version("message"),
+ purl = greta:::check_tf_version("message"),
+ cache = TRUE,
+ comment = NA,
+ progress = FALSE)
...
[pid 25013]
libopenblas64_p-r0-0cf96a72.3.23.dev.so->pthread_create(0x7f36ed7ccd80, 0,
0x7f36ebb50660, 0) = 0
blas_thread_init.part.0 (ip = 0x7f36ebb50a2a)
blas_thread_init (ip = 0x7f36ebb50ba9)
gotoblas_init (ip = 0x7f36eb91807f)
__nptl_change_stack_perm (ip = 0x7f36ef0959ce)
__nptl_change_stack_perm (ip = 0x7f36ef095ab4)
_dl_catch_exception (ip = 0x7f36eeed9274)
_dl_rtld_di_serinfo (ip = 0x7f36ef09c21e)
_dl_catch_exception (ip = 0x7f36eeed921a)
_dl_find_dso_for_object (ip = 0x7f36ef09c5b8)
dlmopen (ip = 0x7f36eee0f4b8)
By the time the following code block runs and tries to set
OMP_THREAD_LIMIT=2, it's too late. (It's usually too late to set
OpenMP-related variables inside an R script, because the OpenMP runtime
reads them once at process startup, before any R code can run.)
It might help to set OPENBLAS_NUM_THREADS=2 first, then implicitly load
greta by calling greta:::check_tf_version(...). Though judging by later
results, it's probably a red herring.
Here's node.js creating 16 worker threads through the 'V8' package:
DiagrammeR::export_graph(attr(gr, "dgr_graph"),
+ file_name = "figures/full_graph.png",
+ file_type = "png",
+ width = 958 * 2,
+ height = 450 * 2)
[pid 24967] libnode.so.108->pthread_create(0x555bfee583c0, 0x7ffde39e7c50,
0x7fc7a9f49280, 0x555bfee582f0) = 0
_ZN2v84base6Thread5StartEv (ip = 0x7fc7a9f4a424)
_ZN2v88platform30DefaultWorkerThreadsTaskRunner12WorkerThreadC1EPS1_ (ip =
0x7fc7a95fab76)
_ZN2v88platform30DefaultWorkerThreadsTaskRunnerC2EjPFdvE (ip =
0x7fc7a95fadbc)
_ZN2v88platform15DefaultPlatform37EnsureBackgroundTaskRunnerInitializedEv
(ip = 0x7fc7a95f861e)
_ZN2v88platform18NewDefaultPlatformEiNS0_15IdleTaskSupportENS0_21InProcessStackDumpingESt10unique_p
(ip = 0x7fc7a95f8849)
R_init_V8 (ip = 0x7fc7b001d609)
AddDLL (ip = 0x555baec0a5f1)
do_dynload (ip = 0x555baec0d0d1)
bcEval_loop (ip = 0x555baecd7100)
bcEval (ip = 0x555baece5477)
Here's librsvg -> pango creating worker threads:
DiagrammeR::export_graph(attr(gr, "dgr_graph"),
+ file_name = "figures/coef_graph.png",
+ file_type = "png",
+ width = 325 * 2,
+ height = 123 * 2)
[pid 24967] libglib-2.0.so.0->pthread_create(0x555bff3ad6d0,
0x7ffde39e6c60, 0x7fc7a0003650, 0x555bff3ad6a0) = 0
g_mutex_init (ip = 0x7fc7a0003fb5)
g_thread_new (ip = 0x7fc79ffd9fa8)
pango_fc_font_map_get_config (ip = 0x7fc79fae816f)
pango_glyph_string_x_to_index (ip = 0x7fc7a00b5803)
pango_layout_line_get_extents (ip = 0x7fc7a00c583e)
pango_layout_index_to_line_x (ip = 0x7fc7a00c7efe)
pango_layout_get_size (ip = 0x7fc7a00c8126)
rsvg_pixbuf_from_file_at_max_size (ip =
0x7fc7a074e6d2)
rsvg_pixbuf_from_file_at_max_size (ip =
0x7fc7a06661c6)
rsvg_error_get_type (ip = 0x7fc7a05f869e)
Do they consume a lot of CPU? Probably not.
Here's Tensorflow creating 16 worker threads:
draws <- mcmc(m, n_samples = 1000)
[pid 24967] libtensorflow_framework.so.2->pthread_create(0x555bff178668, 0x7ffde39e8ed8, 0x7fc7eb867150, 0x555bfea36d30) = 0 _ZN3tsl12_GLOBAL__N_18PosixEnv11StartThreadERKNS_13ThreadOptionsERKNSt7__cxx1112basic_stringIcSt11c (ip = 0x7fc7eb866567) _ZN3tsl6thread16EigenEnvironment12CreateThreadESt8functionIFvvEE (ip = 0x7fc7eacd0a39) _ZN5Eigen15ThreadPoolTemplIN3tsl6thread16EigenEnvironmentEEC2EibS3_ (ip = 0x7fc7eaccefaa) _ZN3tsl6thread10ThreadPoolC2EPNS_3EnvERKNS_13ThreadOptionsERKNSt7__cxx1112basic_stringIcSt11char_tr (ip = 0x7fc7eacce92c) _ZN10tensorflow11LocalDevice19EigenThreadPoolInfoC2ERKNS_14SessionOptionsEiPN3tsl9AllocatorE (ip = 0x7fc7ebcbacfc) _ZN10tensorflow11LocalDeviceC2ERKNS_14SessionOptionsERKNS_16DeviceAttributesE (ip = 0x7fc7ebcba857) _ZN10tensorflow16ThreadPoolDeviceC1ERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt11char_tra (ip = 0x7fc7ebc1fc36) _ZN10tensorflow29GPUCompatibleCPUDeviceFactory13CreateDevicesERKNS_14SessionOptionsERKNSt7__cxx1112 (ip = 0x7fc7ebd30b64) _ZN10tensorflow13DeviceFactory13AddCpuDevicesERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt (ip = 0x7fc7eb60eabc) _ZN10tensorflow13DeviceFactory10AddDevicesERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt11c (ip = 0x7fc7eb60eb92) It _says_ it runs on "up to 2 CPU cores", but empirically it consumes up to ~400% CPU (4 out of 16 logical CPUs on this computer) while mcmc(...) is running. Judging by the profiling data, most of the CPU usage is due to tensorflow/Eigen running multiple threads during the call to mcmc(), possibly due to the default argument chains = 4? -- Best regards, Ivan