[Bioc-devel] C++ parallel computing
Incidentally, I was reflecting on this topic the other day and was wondering whether BiocParallel could have something like OpenMPParam() that sets the number of threads to some non-zero value via omp_set_num_threads(). This would provide a consistent framework through which users could control OpenMP behavior in suitably written functions. One could even imagine having a composition design where a caller could assemble a BPPARAM object like: bplapply(..., BPPARAM=OpenMPParam(SnowParam(5), 2)) which tells bplapply to spin up 5 workers where each worker is allowed to use up to 2 threads each. Implementation-wise, it would be a relatively simple matter of stuffing an extra set-up command into .composeTry; the nthread-setting code can be borrowed from ShortRead. For context: I am planning on moving more parallelization in my packages into OpenMP to get around the overhead of the other backends. Forking is the only approach that is remotely fast enough, but the interaction of forks with the GC is too chaotic in memory-limited environments. -A
On 5/25/21 10:39 AM, Martin Morgan wrote:
If the BAM files are each processed independently, and each processing task takes a while, then it is probably 'good enough' to use R-level parallel evaluation using BiocParallel (currently the recommendation for Bioconductor packages) or other evaluation framework. Also, presumably you will use Rhtslib, which provides C-level access to the hts library. This will requiring writing C / C++ code to interface between R and the hts library, and will of course be a significant underataking.
It might be worth outlining in a bit more detail what your task is and how (not too much detail!) you've tried to implement this in Rsamtools.
Martin Morgan
?On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko" <bioc-devel-bounces at r-project.org on behalf of oleksii.nikolaienko at gmail.com> wrote:
Dear Bioc team,
I'd like to ask for your advice on the parallelization within a Bioc
package. Please point me to a better place if this mailing list is not
appropriate.
After a bit of thinking I decided that I'd like to parallelize processing
at the level of C++ code. Would you strongly recommend not to and use an R
approach instead (e.g. "future")?
If parallel C++ is ok, what would be the best solution for all major OSs?
My initial choice was OpenMP, but then it seems that Apple has something
against it (https://mac.r-project.org/openmp/). My own dev environment is
mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.
(On the actual task: loading and specific processing of very large BAM
files, ideally significantly faster than by means of Rsamtools as a backend)
Best,
Oleksii Nikolaienko
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel