Dear Anna,
According to the documentation of "BiocParallelParam", SnowParam() is a
subclass suitable for distributed memory (e.g. cluster) computing. If
you're running your code on a simpler machine with shared memory (e.g. your
PC), you're probably better off using MulticoreParam() instead. Here's a
modified example based on yours:
# Setup
library(parallel)
library(BiocParallel)
my_list <- list(1:10, 11:20, 21:30, 31:40, 41:50, 51:60, 61:70, 71:80,
81:90)
FUN <- function(x) return(x ^ 10)
ncores <- min(detectCores() - 1L, 10L)
# Parallel
cl <- makeCluster(ncores)
print(system.time(res <- clusterApplyLB(cl, my_list, FUN)))
stopCluster(cl)
# BiocParallel
parallel_param_1 <- SnowParam(workers = ncores, tasks = length(my_list))
print(system.time(res2 <- bplapply(my_list, FUN, BPPARAM =
parallel_param_1)))
parallel_param_2 <- MulticoreParam(workers = ncores, tasks =
length(my_list))
print(system.time(res3 <- bplapply(my_list, FUN, BPPARAM =
parallel_param_2)))
On my machine, the output is as follows (notice the last column, with the
total time, shows MulticoreParam() performing better than parallel):
brukar system brukt
0.000 0.004 0.088
brukar system brukt
0.114 0.001 1.336
brukar system brukt
0.074 0.124 0.060
How does that work on your actual data?
Best,
Waldir
ti., 08.08.2023 kl. 13.10 +0200, skrev Anna Plaxienko:
Hi all!
I'm switching from the base R *parallel* package to *BiocParallel* for my
Bioconductor submission and I have two questions. First, I wanted advice on
whether I've implemented load balancing correctly. Second, I've noticed
that the running time is about 15% longer with BiocParallel. Any ideas why?
Parallel code
cl <- makeCluster(ncores)
res <- clusterApplyLB(cl, my_list, FUN)
stopCluster(cl)
BiocParallel
parallel_param <- SnowParam(workers = ncores, type = "SOCK", tasks =
length(my_list))
res2 <- bplapply(my_list, FUN, BPPARAM = parallel_param)
Thank you!
Best regards,
Anna Plaksienko
[[alternative HTML version deleted]]