Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

Thu, Sep 4, 2008 7:57 PM

Depends on the specific function.  The communication cost is 
significant, especially serialization and deserialization.  (Since I 
finally found the right way to force a flush of the TCP data, the actual 
network cost isn't a problem for moderate sized data.)  For reasons of 
simplicity of implementation and ease of correctness, a lot of the R 
environment is serialized and sent over with *each* operation.

In terms of the instruction-level parallelism available, code that is a 
performance bottle-neck is usually re-written in C or Fortran and called 
in large blocks.  So now the program is trying to find parallelism in 
the large blocks, which it usually can't.

I didn't have a lot of suitable code to try, and so the best example 
program was one that did a complex calculation followed by an accumulate 
operation in a loop.  Parallel-R/taskPR dynamically unrolled the loop 
(just like Tomosulo's algorithm does on a processor) and got a 
reasonable speedup (about half of linear).  Unfortunately, I don't even 
have that code example any more.

Yes, most especially if serialization and deserialization could be 
avoided.  However, I don't believe R is thread-safe?  (Using shared 
memory, but between multiple R processes, was on the TODO list when the 
project ended.)

I was fortunate to have access to a very large NUMA machine at the time 
that I was originally working on this project, so the network itself 
wasn't a limiting factor.  (The network stack turned out to be a 
problem, though.)


David Bauer

Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

Thread (4 messages)