Skip to content

Resources for utilizing multiple processors

4 messages · Robin Jeffries, Brian Ripley, Mike Marchywka +1 more

#
On Wed, 8 Jun 2011, Robin Jeffries wrote:

            
By restricting yourself to one of the least capable OS R runs on, you 
are making this harder for yourself.
By using virtual memory (R does not in fact use RAM, it always uses 
virtual memory).  With a 64bit R you can use up to terabytes of VM. 
Because Windows' disc access is so slow, you will need to set a 
max-memory-size larger than your RAM size to enable this.
As stated, that is pointless.  The core running FUN2 would be waiting 
for the resuls of FUN1.  However, at time t FUN1 could generate 
a(t+1) from a(t-1) whilst FUN2 generates b(t) and c(t).
Look into package snow (with socket clusters).  The overhead of what 
you ask may be too high (POSIX OSes can use package multicore, which 
has a much lower overhead), but if the calculations are slow enough it 
may be worthwhile.  There are Windows-oriented examples in package 
RSiena.

  
    
#
----------------------------------------
the
Well, for the situation below you seem to want a function
server. You could consider Rapache and just write this like a big
web application. A web server, like a DB, is not the first thing
you think of with high performance computing but if your computationally
intenstive tasks are in native code this could be a reasoanble
overhead that requires little learning. 

If you literally means cores instead of machines keep in mind
that cores can end up fighting over resources, like memory
( this cites IEEE article with cores making things worse
in non-contrived case)

http://lists.boost.org/boost-users/2008/11/42263.php


I think people have mentioned some classes like bigmemory, I forget
the names exactly, that let you handle larger things. Launching a bunch
of threads and letting VM thrash can easily make things slower quickly.

I guess a better approach would be to get an implementation that is
block oriented and you can do the memory/file stuff in R until
they get a data frame that uses disk transparently and with hints on
expected access patterns ( prefetch etc).
[[elided Hotmail spam]]
#
On 06/08/2011 08:54 PM, Robin Jeffries wrote:
If FUN1 is independent of b() and c(), perhaps the example at the bottom 
of ?socketConnection points in a useful direction -- start one R to 
calculate a(t) and send the result to a socket connection, then move on 
to a(t+1). Start a second R to read from the socket connection and do 
FUN2(t), . You'll be able to overlap the computations and double 
throughput; the 'pipeline' could be extended with pre- and 
post-processing workers, too, though one would want to watch out for the 
complexity of managing this.

Martin