Deathstar is a collection of init scrips and a few simple zmq binaries
that allow for distributed computing using EC2 directly from your own
workstation.
You can check out your own copy here: https://github.com/armstrtw/deathstar.
I've rolled it up into a public ami since I haven't had the time to
put out a deb or rpm yet. The ami is a clone of just released ubuntu
11.10 with a smattering of R packages installed as well as libzmq and
the startup scripts from deathstar.
Anyone with a valid EC2 account should be able to run this example
(just substitute in your own key name in the startCluster command)
assuming you are already set up to use ec2 commands from a shell.
This example uses only 2 of the 8 core boxes amazon provides (c1.xlarge).
You'll need the most recent versions of rzmq and AWS.tools installed
on your local machine to run this demo. You can obtain the packages
from github. I'm not sure the cran versions are up to date.
https://github.com/armstrtw/AWS.toolshttps://github.com/armstrtw/rzmq
These packages are very new, so please email me if you encounter any
bugs (or post an issue on the respective github site).
I think these packages significantly lower the bar for anyone who
wants to use R and AWS for distributed computing. Another key
advantage is that you can fire any AMI you want, so please feel free
to clone my ami, and install your own custom packages (that's what I
do for my own sims).
Feedback welcome.
This is the sim I ran (again shamelessly stealing JD Long's estimatePi example).
whit at spartan:~$ cat zmq.aws.lapply.test.r
library(AWS.tools)
library(rzmq)
estimatePi <- function(seed) {
set.seed(seed)
numDraws <- 1e6
r <- .5
x <- runif(numDraws, min=-r, max=r)
y <- runif(numDraws, min=-r, max=r)
inCircle <- ifelse( (x^2 + y^2)^.5 < r , 1, 0)
sum(inCircle) / length(inCircle) * 4
}
cl <- startCluster(ami="ami-9d5f93f4",key="maher-ave",instance.count=2,instance.type="c1.xlarge")
print("starting sim.")
run.time <- system.time(ans <-
zmq.cluster.lapply(cluster=cl$instances[,"dnsName"],as.list(1:1e3),estimatePi))
print("sim completed.")
res <- terminateCluster(cl)
print(mean(unlist(ans)))
print(run.time)
print(attr(ans,"execution.report"))
pi.est <- mean(unlist(ans))
print("result:")
print(pi.est)
whit at spartan:~$
and output from my run: