I felt this must be an FAQ but I don't see it anywhere: my apologies if I've missed it. I have what I believe is known as an "embarrassingly parallel" problem comprising a large number of repetitions of a single (lengthy) calculation that generates a boolean result and I am simply interested in the final proportion of true to false runs. This obviously lends itself to parallel computation and I'd appreciate Mac- specific pointers to both simple distributed and multiple-processor options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?). Alternatively I have an option to acquire a MacPro for this work and would appreciate guidance as to whether it's possible to leverage multiple processors? I'm aware R itself is not currently multithreaded (whilst having only a lay understanding of what that means).
FAQ? Mac distributed/multiple processor solutions?
8 messages · Rob Forsyth, Richard Pearson, Kasper Daniel Hansen +4 more
Hi Rob You might want to look at the snow package. This can be used either with Rmpi or without (using socket connections). I've successfully used this for speeding up things on multi-node clusters, and also on a single multi-core mac. I've included some brief instructions on getting things working in chapter 6 of the user guide for my (bioconductor) package puma - hope this helps! Richard.
Rob Forsyth wrote:
I felt this must be an FAQ but I don't see it anywhere: my apologies if I've missed it. I have what I believe is known as an "embarrassingly parallel" problem comprising a large number of repetitions of a single (lengthy) calculation that generates a boolean result and I am simply interested in the final proportion of true to false runs. This obviously lends itself to parallel computation and I'd appreciate Mac- specific pointers to both simple distributed and multiple-processor options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?). Alternatively I have an option to acquire a MacPro for this work and would appreciate guidance as to whether it's possible to leverage multiple processors? I'm aware R itself is not currently multithreaded (whilst having only a lay understanding of what that means).
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
On Dec 4, 2007, at 12:52 AM, Rob Forsyth wrote:
Alternatively I have an option to acquire a MacPro for this work and would appreciate guidance as to whether it's possible to leverage multiple processors? I'm aware R itself is not currently multithreaded (whilst having only a lay understanding of what that means).
One important thing here: while R is not multithreaded, on Mac OS X, R uses a special BLAS which is multithreaded. So anything involving linear algebra (which for some problems is a major part of the computational load), will benefit from having multiple CPUs in the same machine. Depending on your problem this may be indeed speed up your things. Kasper
On Tue, 4 Dec 2007, Rob Forsyth wrote:
options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?).
With only three computers it would be easiest to divide the job manually. -thomas
For a one-time run this is true, but if you find yourself doing this (or similar things) often, it can be a nuisance to break it up every time. The snow package is fairly painless to install and works great. I found, however, that if you have things (eg R, LAM/MPI, ...) installed outside of the default Mac OS X path when connecting with ssh that they won't run unless you add (or uncomment?) ?PermitUserEnvironment yes? to /private/etc/sshd_con?g in order to use the modified path (set in .bash_profile) on the other OS X machines. Perhaps there is a better way, but that is how I got it working. Although I think this is the default setting on Leopard (and would only be an issue for earlier versions). Best, Randy
On Dec 4, 2007, at 1:57 PM, Thomas Lumley wrote:
On Tue, 4 Dec 2007, Rob Forsyth wrote:
options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?).
With only three computers it would be easiest to divide the job manually. -thomas
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Randall C Johnson Bioinformatics Analyst SAIC-Frederick, Inc (Contractor) Laboratory of Genomic Diversity NCI-Frederick, P.O. Box B Bldg 560, Rm 11-85 Frederick, MD 21702 Phone: (301) 846-1304 Fax: (301) 846-1686 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 day later
On 05/12/2007, at 5:57 AM, Thomas Lumley wrote:
On Tue, 4 Dec 2007, Rob Forsyth wrote:
options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?).
With only three computers it would be easiest to divide the job manually.
Much easier, and if you have access to a machine with multiple processors, simply duplicate the R process to have the same number as the number of processors, and then run them simultaneously. Not as elegant and maybe not as efficient as other methods, but effective. Ken
On Thu, 6 Dec 2007, Ken Beath wrote:
On 05/12/2007, at 5:57 AM, Thomas Lumley wrote:
On Tue, 4 Dec 2007, Rob Forsyth wrote:
options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?).
With only three computers it would be easiest to divide the job manually.
Much easier, and if you have access to a machine with multiple processors, simply duplicate the R process to have the same number as the number of processors, and then run them simultaneously. Not as elegant and maybe not as efficient as other methods, but effective.
But those processes need to do different things (and record the results in different files), which is what Thomas means by 'divide the job manually'. Incidentally, I find it useful to run slightly more R processes than the number of processors, to ensure full CPU usage when one of the processes is in an I/O wait or hits a swapping trap. (Provided you have ample RAM or you will get additional swapping.) Even with many processors it may be easisest to do this manually. Our geneticists do simulation-based inference by running separate simulation runs on up to 100s of processors simultaneously: the scheduler works better with independent jobs.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 06/12/2007, at 9:13 PM, Prof Brian Ripley wrote:
On Thu, 6 Dec 2007, Ken Beath wrote:
On 05/12/2007, at 5:57 AM, Thomas Lumley wrote:
On Tue, 4 Dec 2007, Rob Forsyth wrote:
options here. I am working on an iMac G5 and could access (at home - i.e. not on a LAN) another G5 and a G4. I've come across the R/MPI package but would appreciate advice as to how easy this is to set up (would it actually be simpler to divide the job "manually"?).
With only three computers it would be easiest to divide the job manually.
Much easier, and if you have access to a machine with multiple processors, simply duplicate the R process to have the same number as the number of processors, and then run them simultaneously. Not as elegant and maybe not as efficient as other methods, but effective.
But those processes need to do different things (and record the results in different files), which is what Thomas means by 'divide the job manually'. Incidentally, I find it useful to run slightly more R processes than the number of processors, to ensure full CPU usage when one of the processes is in an I/O wait or hits a swapping trap. (Provided you have ample RAM or you will get additional swapping.) Even with many processors it may be easisest to do this manually. Our geneticists do simulation-based inference by running separate simulation runs on up to 100s of processors simultaneously: the scheduler works better with independent jobs.
I meant duplicate the R application, using the Finder. I thought it was unnecessary to mention that it will require a different set of commands to be run in each copy of R. Ken