Hi I am a newbie to parallel computing, let alone Rmpi, and appreciate if you could give me advice on R parallel computing in general. I am a database marketing analyst and was told by my boss to do some research to see if we could set up a parallel computing system on our current server system at work. This is because some of our clients asked us to do statistical analysis on a large data (about 30 million records * 12 columns) - multinomial logit model and some neural network analysis. Our server spec is like 2* Xeon 5630 16GB Memory 2*500GB and we are planning to purchase another set of this server for small-sized cluster computing. While I was reading http://www.bioconductor.org/help/bioconductor-cloud-ami/ the following questions occurred to me and I would like to ask you: 1. Could you share websites/books/journals that I could deepen my understanding of how MPICH2 and R work on linux? 2. With our current server system, is this code correct? "mpirun -np 1 --hostfile /usr/local/Rmpi/hostfile R --no-save -f /usr/local/Rmpi/xxxx.R --args 8 " 3. If I use Amazon EC2 and initiate 4 extra large instances, is this code correct? "mpiutil -a accessid -s keyid -w 16 -n clustername -t m1.large -v volumeid" "mpirun -np 1 --hostfile /usr/local/Rmpi/hostfile R --no-save -f /usr/local/Rmpi/xxxx.R --args 16 " Sorry for my fundamental and perhaps invalid questions. Many thanks in advance. Taka
Newbie question to Rmpi
1 message · Takatsugu Kobayashi