Skip to content

[Bioc-devel] bplapply Processes Sometimes Stall

4 messages · Martin Morgan, Dario Strbenac

#
Good day,

I have a minimal example of bplapply stalling.

results <- lapply(1:20, function(variety)
{
  message("Variety", variety)
  bplapply(1:100, function(index) {res <- list(sample(20000), sample(c("Healthy", "Disease"), 20000, replace = TRUE)); res}, BPPARAM = MulticoreParam(workers = 25))
})

It sometimes stalls on no particular iteration, but other times it runs all 20 iterations and returns to the R command prompt. It's not reproducible when the stall happens. I am trying to find the cause of a cross-validation loop that progresses for a few hours, then stalls. When the stall happens, two or three of the R processes appear to be using 100% CPU whereas the others are finished, according to the output of top. The server was previously running R 3.1.2 and Debian 7 and this didn't ever happen. The server has 48 processors.

If I set workers to 5, it always completes the loop and returns to the prompt. Using mclapply with mc.cores set to 25 also always works, so the problem is with bplapply.

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.4.3

loaded via a namespace (and not attached):
[1] futile.logger_1.4.1  lambda.r_1.1.7       futile.options_1.0.0

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
#
Hi Dario -- This didn't cause problems for me. One thing you might do is to re-use the MulticoreParam() like this

    param = bpstart(MulticoreParam(25))
    lapply(1:20, function(variety) bplapply(1:100, function() {}, BPPARAM = param))
    bpstop(param)

Also there have been a number of changes in devel and if possible it would be good to know whether the problem still appears there.

It might be interesting to see the number of connections open when the process hangs, from the command line while R is still running

   ~$ lsof -i |grep -c R

Martin
#
Good day,

Today, I can't reproduce the stalling. It runs all 20 iterations to completion. When the loop finishes, I see many R processes shown in the output of top as zombies, though. I killed all R processes before running the test, so these are originating from the test command I used.

26853 dario     20   0       0      0      0 Z   0.0  0.0   0:00.25 R
26857 dario     20   0       0      0      0 Z   0.0  0.0   0:00.25 R
26860 dario     20   0       0      0      0 Z   0.0  0.0   0:00.25 R
26861 dario     20   0       0      0      0 Z   0.0  0.0   0:00.22 R
26863 dario     20   0       0      0      0 Z   0.0  0.0   0:00.22 R
          ...                             ...

Do you observe this on your server, too ?

On the ninth try, however, I did manage to completely crash R :

...                    ...
Variety12
Variety13
Variety14
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x731ff)[0x7fc1619ad1ff]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fc161a304c7]
/lib/x86_64-linux-gnu/libc.so.6(+0xf46e0)[0x7fc161a2e6e0]
/lib/x86_64-linux-gnu/libc.so.6(+0xf6437)[0x7fc161a30437]
...                    ...
7ffeee1f8000-7ffeee23a000 rw-p 00000000 00:00 0                          [stack]
7ffeee29c000-7ffeee29d000 r-xp 00000000 00:00 0                          [vdso]
7ffeee29d000-7ffeee29f000 r--p 00000000 00:00 0                          [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
1 day later
#
Good day,

The problem is avoided by explicitly using bpstart and bpstop functions.

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia