Good day,
I have a minimal example of bplapply stalling.
results <- lapply(1:20, function(variety)
{
message("Variety", variety)
bplapply(1:100, function(index) {res <- list(sample(20000), sample(c("Healthy", "Disease"), 20000, replace = TRUE)); res}, BPPARAM = MulticoreParam(workers = 25))
})
It sometimes stalls on no particular iteration, but other times it runs all 20 iterations and returns to the R command prompt. It's not reproducible when the stall happens. I am trying to find the cause of a cross-validation loop that progresses for a few hours, then stalls. When the stall happens, two or three of the R processes appear to be using 100% CPU whereas the others are finished, according to the output of top. The server was previously running R 3.1.2 and Debian 7 and this didn't ever happen. The server has 48 processors.
If I set workers to 5, it always completes the loop and returns to the prompt. Using mclapply with mc.cores set to 25 also always works, so the problem is with bplapply.
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.4.3
loaded via a namespace (and not attached):
[1] futile.logger_1.4.1 lambda.r_1.1.7 futile.options_1.0.0
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
[Bioc-devel] bplapply Processes Sometimes Stall
4 messages · Martin Morgan, Dario Strbenac
Hi Dario -- This didn't cause problems for me. One thing you might do is to re-use the MulticoreParam() like this
param = bpstart(MulticoreParam(25))
lapply(1:20, function(variety) bplapply(1:100, function() {}, BPPARAM = param))
bpstop(param)
Also there have been a number of changes in devel and if possible it would be good to know whether the problem still appears there.
It might be interesting to see the number of connections open when the process hangs, from the command line while R is still running
~$ lsof -i |grep -c R
Martin
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Dario Strbenac [dstr7320 at uni.sydney.edu.au]
Sent: Sunday, December 27, 2015 8:00 PM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] bplapply Processes Sometimes Stall
Sent: Sunday, December 27, 2015 8:00 PM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] bplapply Processes Sometimes Stall
Good day,
I have a minimal example of bplapply stalling.
results <- lapply(1:20, function(variety)
{
message("Variety", variety)
bplapply(1:100, function(index) {res <- list(sample(20000), sample(c("Healthy", "Disease"), 20000, replace = TRUE)); res}, BPPARAM = MulticoreParam(workers = 25))
})
It sometimes stalls on no particular iteration, but other times it runs all 20 iterations and returns to the R command prompt. It's not reproducible when the stall happens. I am trying to find the cause of a cross-validation loop that progresses for a few hours, then stalls. When the stall happens, two or three of the R processes appear to be using 100% CPU whereas the others are finished, according to the output of top. The server was previously running R 3.1.2 and Debian 7 and this didn't ever happen. The server has 48 processors.
If I set workers to 5, it always completes the loop and returns to the prompt. Using mclapply with mc.cores set to 25 also always works, so the problem is with bplapply.
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.4.3
loaded via a namespace (and not attached):
[1] futile.logger_1.4.1 lambda.r_1.1.7 futile.options_1.0.0
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Good day,
Today, I can't reproduce the stalling. It runs all 20 iterations to completion. When the loop finishes, I see many R processes shown in the output of top as zombies, though. I killed all R processes before running the test, so these are originating from the test command I used.
26853 dario 20 0 0 0 0 Z 0.0 0.0 0:00.25 R
26857 dario 20 0 0 0 0 Z 0.0 0.0 0:00.25 R
26860 dario 20 0 0 0 0 Z 0.0 0.0 0:00.25 R
26861 dario 20 0 0 0 0 Z 0.0 0.0 0:00.22 R
26863 dario 20 0 0 0 0 Z 0.0 0.0 0:00.22 R
... ...
Do you observe this on your server, too ?
On the ninth try, however, I did manage to completely crash R :
... ...
Variety12
Variety13
Variety14
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x731ff)[0x7fc1619ad1ff]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fc161a304c7]
/lib/x86_64-linux-gnu/libc.so.6(+0xf46e0)[0x7fc161a2e6e0]
/lib/x86_64-linux-gnu/libc.so.6(+0xf6437)[0x7fc161a30437]
... ...
7ffeee1f8000-7ffeee23a000 rw-p 00000000 00:00 0 [stack]
7ffeee29c000-7ffeee29d000 r-xp 00000000 00:00 0 [vdso]
7ffeee29d000-7ffeee29f000 r--p 00000000 00:00 0 [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
1 day later
Good day, The problem is avoided by explicitly using bpstart and bpstop functions. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia