distributed R on EC2, designing the software stack
On 29 April 2009 at 12:06, Stephen J. Barr wrote:
| 1) R 2.9.0 + OpenMPI + RMpi + Snowfall/sfCluster | - will Amazon's network work with OpenMPI. Perhaps it would be | better to use PVM or something that is more tolerant to non-optimal | network If you can use standard snow rather snowfall/sfcluster, then (I believe) you are done. As per some emails on the Open MPI list from last fall or summer, you get Debian / Ubuntu instances where all this is just an 'apt-get install' or two away given the set of packages I maintain for Debian. Plus you get slurm to control it. | 2) R 2.9.0 + "socket based communication" + Snowfall/sfCluster | - is this scalable Likewise, snow and sockets works as is on Debian / Ubuntu. | 3) R 2.9.0 + twisted + NetWorkSpaces | - not sure of Amazon's network supports broadcast mode, which is | required by twisted Should also works out-of-the box via the r-cran-nws and python-nwsserver package I maintain. | 4) Biocep-R | - this looks like it has the functionality to do what I want, but a | lot of other stuff as well. Yep, but I haven't had a chance to look more closely. | 5) RHIPE | - Hadoop is well supported by EC2. Perhaps this is the way to go. | Seems like a very new package :) Yes, and there is more Hadoop stuff cooking on R-Forge. | What are people's thoughts on what would be a good software stack with | the constraint that it should be simple and run on EC2? I use the computer hanging around the house. If you have a desktop and a laptop, you are ready to go. Or if you have enough ram, you can try virtual approaches as well. Last time I tried (for my HPC tutorials) the networking was fully 'see-through' yet though I hear that VirtualBox improved there. Let us know what you come up with. Dirk
Three out of two people have difficulties with fractions.