Skip to content
Prev 8486 / 21307 Next

[Bioc-devel] Progress Message Order in bplapply

Hi Dario -- it was this commit

------------------------------------------------------------------------
r111519 | mtmorgan at fhcrc.org | 2015-12-15 14:34:18 -0500 (Tue, 15 Dec 2015) | 2 lines

port: r111463, bugfix: workers=1, tasks=0 assigns all X to one chunk

------------------------------------------------------------------------

in response to this report

https://support.bioconductor.org/p/75945/

Previously, the behavior when the number of 'tasks' was unspecified (default value 0) was to split X (in your example, the vector 1:100) into 100 individual tasks 1, 2, 3, ..., and to process each in a completely independent parallel process -- there would be a total of 100 processes started and stopped. The change mentioned above instead behaves as documented, splitting the 100 elements approximately evenly between the specified number of workers (25), and sending several elements to each worker for processing. This saves the cost of communicating the object to and from the worker. You can get the old behavior by specifying tasks = length(X), for your example tasks=100. 

The 'split' of elements into tasks can be seen by calling the internal function .splitX()
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6 7 8

[[3]]
[1]  9 10 11 12

[[4]]
[1] 13 14 15 16

[[5]]
[1] 17 18 19 20

[[6]]
[1] 21 22 23 24


Each element of the call to splitX is assigned in order, but the precise schedule is somewhat indeterminate -- task 1 might be assigned before task 2, but perhaps the process handling task 1 runs the garbage collector before sleeping so task 2 finishes ahead of task 1. Under the original scheme I guess you were relying on the average execution time of ten processes between each message, whereas in the correct scheme you are relying on the average execution time of just three processes so greater variability. Either way, though, the order of execution is not guaranteed.

Messages are reported at the end of each task; there are 100 opportunities for messages when the number of tasks is 100, but only 25 opportunities (corresponding approximately to each processor handling 4 elements) otherwise.

Other than being different from previously, is there an underlying problem?

Martin