Skip to content

parallel execution on a single machine: sockets vs multicore?

2 messages · Peter Langfelder, Brian G. Peterson

#
Hi all,

I would like to parallelize some of my R code using either the socket
cluster approach or the parallel execution using multicore. At this
point I am leaning towards sockets since they also work on Windows
which would make my code  more portable. I am curious whether there
are some disadvantages of sockets vs. multicore - I know that with
multicore, worker processes don't incur extra memory overhead for
objects they use as read-only (because of modern OS copy-on-write
approach). Is that also true for a socket cluster run on a single
machine?

Thanks in advance for any insights.

Peter
#
On 08/20/2012 06:52 PM, Peter Langfelder wrote:
I recommend using the 'foreach' package for any code you intend to 
distribute to others, or for code which you may choose to switch from 
one parallelization backend to another (in your example because of 
switching from Windows to linux/Mac).

This allows you to write the code once using 'foreach' and %dopar%, 
fails gracefully to a single thread, and supports many different 
parallel back ends, notably in this case 'doParallel', which will 
automatically use multicore on *nix and sockets on Windows with minimal 
to no intervention on your part.

Note that there is a small amount of overhead for the flexibility of 
multiple back ends, but I find the advantages in a lot of code I write 
to outweigh this significantly.

One additional example is that you can debug single threaded, easily 
expand to multiple cores on a single machine for the next round of 
testing, then distribute over a cluster of many nodes (using e.g. 
doRedis or doMPI back ends).

I've always thought the proliferation of different *apply methods in R 
with slightly different syntaxes for different parallelization 
approaches a very messy architectural choice as well, requiring me to 
perform surgery on working code to change parallelization methods...


Regards,

    - Brian