Skip to content

High performance computing with R

1 message · Zachary Mayer

#
Hello,

I'm not 100% sure how to respond to an individual message from the
daily digest, so I apologize if I am violating protocol here.

Ben-- I suggest you do a little research into the foreach package for
R, as well as the various foreach backends, which include doMC, doSMP,
doSnow, doMPI and doRedis. foreach is a generalized framework to
parallelize for loops in r. The various backends enable that
parallelism using different technologies: doMC uses the "fork" command
on linux, doSnow uses a "Snow" cluster, and doRedis uses a redis
server.  Each backend has various pros and cons. As stated before,
doMC (and it's sister package multicore) are probably the best
solution for a single machine: you can use the function 'mclapply' to
replace the vanilla function 'lapply' and have instant parallelism
with almost no extra work, but neither package works on windows or
with Rstudio.   doRedis is my current favorite solution for clusters
of multiple machines on amazon EC2, but it takes a small amount of
extra work to setup a redis server.

The answer to your question really depends on your operating system,
how many machines you have, and what technologies you are comfortable
with.  Do a some research before you commit to hardware, and re-write
your code to make use of the 'foreach' looping structure.

Good luck!

-Zach