Skip to content
Prev 363448 / 398502 Next

Efficiently parallelize across columns of a data.table

Last time I looked (admittedly a few years back), on unix-alikes
(which you seem to be using, based on your use of top),
foreach/doParallel used forking. This means each worker gets a copy of
the entire R session, __but__ modern operating systems do not actually
copy on spawn, they only copy on write (i.e., when the worker process
starts modifying the existing variables). I believe top shows memory
use as if the copy actually occurred (what the operating system
promises to each worker).

I would run the code and monitor usage of swap space - as long as the
system isn't swapping to disk, I would not worry about copying the
table to every slave node, since the copy doesn't really happen unless
the worker processes modify the table.

HTH,

Peter
On Fri, Aug 19, 2016 at 11:22 AM, Rebecca Payne <rebeccapayne at gmail.com> wrote: