"chunking" parallel tasks

There are some very sophisticated strategies for chunking out there in
the parallel processing world, aimed at dealing with the load balancing
problem.  The idea is to use very large chunks at the beginning and
middle stages of the computation, to minimize communication overhead and
the like, but then switch to smaller ones near the end in order to keep
all the processes busy.  For example, the "guided" option in OpenMP does
this.

However, in my experience, it is seldom necessary to resort to this, as
load balancing is typically not a problem.  One can even show this
theoretically:  Say task times are T[1], T[2], etc., and are i.i.d.  The
standard deviation of sum(T[1:k]), divided by the mean, goes to 0 as k
goes to infinity--i.e. the sum is essentially constant.  Of course, that
is an idealized model, but again in practice I have found that load
balancing is not much of an issue.

For that reason and because of the communication overhead, in most cases
it is actually faster to statically assign 1/p of the tasks to each of
the p processes, i.e. not do chunking.

"chunking" parallel tasks

Thread (11 messages)