An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20100118/6b4ee487/attachment.pl>
Plain: Problem with Rmpi
6 messages · sebastian.rohrer at basf.com, Stephen Weston, Hao Yu +1 more
Why are you calling mpi.close.Rslaves? I believe that you should only do that if you've started the slaves via mpi.spawn.Rslaves. I'm not sure if that has any bearing on your problem, however. But I wouldn't draw too many conclusions based on not seeing messages to stdout from the slaves right before quitting, especially messages that are produced on different machines. To improve the chances of it working, I suggest that you add a call to flush(stdout()) after the cat. But you might want to do something else to prove whether the slaves are really running. - Steve
On Mon, Jan 18, 2010 at 8:42 AM, <sebastian.rohrer at basf.com> wrote:
Dear List, I have the following problem with R (2.10.0) Rmpi using OpenMPI 1.3.3.: I use Dirk Eddelbuettels test script for testing if Rmpi works. (http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf, the example is on Slide 96) The script looks as follows: require(Rmpi) rk <- mpi.comm.rank(0) sz <- mpi.comm.size(0) name <- mpi.get.processor.name() cat("Hello, rank", rk, "size", sz, "on", name, "\n") mpi.close.Rslaves() mpi.quit() According to Dirk's slides, the output should look something like: Hello, rank 4 size 8 on ron Hello, rank 0 size 8 on ron Hello, rank 3 size 8 on mccoy Hello, rank 7 size 8 on mccoy Hello, rank Hello, rank 21 size 8 on joe size 8 on tony Hello, rank 6 size 8 on tony Hello, rank 5 size 8 on joe I call the script: /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 /programs/R/R-2.10.0/bin/Rscript mpiTest_03.R The output looks like: master (rank 0, comm 1) of size 4 is running on: node02 slave1 (rank 1, comm 1) of size 4 is running on: node03 slave2 (rank 2, comm 1) of size 4 is running on: node02 slave3 (rank 3, comm 1) of size 4 is running on: node03 Hello, rank 0 size 4 on node02 So, in my understanding there is a master and 3 slaves are spawned. But the call for the processor name is executed only on the master. Right? Any suggestions on what could be the problem? Thanks a lot! Sebastian ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
mpi.spawn.Rslaves is not needed since orterun is used to create 1 master and 3 slaves. I assume Rprofile in Rmpi is used. Then all slaves are in infinite loop waiting for master instructions. They will not take any R scripts through BATCH mode (only master will process them). You may use mpi.remote.exec to get what you want or modify slave.hostinfo. Hao
Stephen Weston wrote:
Why are you calling mpi.close.Rslaves? I believe that you should only do that if you've started the slaves via mpi.spawn.Rslaves. I'm not sure if that has any bearing on your problem, however. But I wouldn't draw too many conclusions based on not seeing messages to stdout from the slaves right before quitting, especially messages that are produced on different machines. To improve the chances of it working, I suggest that you add a call to flush(stdout()) after the cat. But you might want to do something else to prove whether the slaves are really running. - Steve On Mon, Jan 18, 2010 at 8:42 AM, <sebastian.rohrer at basf.com> wrote:
Dear List, I have the following problem with R (2.10.0) Rmpi using OpenMPI 1.3.3.: I use Dirk Eddelbuettels test script for testing if Rmpi works. (http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf, the example is on Slide 96) The script looks as follows: require(Rmpi) rk <- mpi.comm.rank(0) sz <- mpi.comm.size(0) name <- mpi.get.processor.name() cat("Hello, rank", rk, "size", sz, "on", name, "\n") mpi.close.Rslaves() mpi.quit() According to Dirk's slides, the output should look something like: Hello, rank 4 size 8 on ron Hello, rank 0 size 8 on ron Hello, rank 3 size 8 on mccoy Hello, rank 7 size 8 on mccoy Hello, rank Hello, rank 21 size 8 on joe size 8 on tony Hello, rank 6 size 8 on tony Hello, rank 5 size 8 on joe I call the script: /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 /programs/R/R-2.10.0/bin/Rscript mpiTest_03.R The output looks like: master (rank 0, comm 1) of size 4 is running on: node02 slave1 (rank 1, comm 1) of size 4 is running on: node03 slave2 (rank 2, comm 1) of size 4 is running on: node02 slave3 (rank 3, comm 1) of size 4 is running on: node03 Hello, rank 0 size 4 on node02 So, in my understanding there is a master and 3 slaves are spawned. But the call for the processor name is executed only on the master. Right? Any suggestions on what could be the problem? Thanks a lot! Sebastian ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Department of Statistics & Actuarial Sciences Fax Phone#:(519)-661-3813 The University of Western Ontario Office Phone#:(519)-661-3622 London, Ontario N6A 5B7 http://www.stats.uwo.ca/faculty/yu
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20100119/520379c9/attachment.pl>
From looking over the Rmpi Rprofile, it appears that it is used to put the
cluster workers into a "worker loop" or "task loop", allowing you to use the higher level functions in Rmpi when spawn isn't supported, or if you choose to start your workers using orterun, perhaps for performance reasons. From your message, I see that you are starting the workers using orterun, since you specified "-n 4". Rmpi's Rprofile definitely isn't compatible with the doMPI package, since it puts all of the workers into an Rmpi worker loop, ready to execute "Rmpi tasks", thus preventing them from executing "doMPI tasks". In doMPI, the startMPIcluster function plays a very similar role as .Rprofile in Rmpi. That is, it puts all of the workers into a doMPI worker loop, allowing them to execute tasks sent to them from foreach/%dopar%. Note that doMPI and snow only use the "low level" functions in Rmpi, and never make use of Rprofile. They use Rmpi for communication, not execution purposes. I hope that explains a bit of what is going on. - Steve
On Tue, Jan 19, 2010 at 8:58 AM, <sebastian.rohrer at basf.com> wrote:
Hao and Stephen, thanks a lot. Your comments took me a good step further. But I must confess, I am a bit confused about the significance of the .Rprofile provided with Rmpi. I'm quite sure this is because I am a total noob concerning HPC but maybe you can help me understand the issue. After Stephen's response I changed my script to: require(Rmpi) print(mpi.remote.exec(rnorm(5))) mpi.close.Rslaves() mpi.quit() and called it with /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 Rscript mpiTest_03.R ,which ran without error and procduced the following output: master (rank 0, comm 1) of size 4 is running on: node02 slave1 (rank 1, comm 1) of size 4 is running on: node03 slave2 (rank 2, comm 1) of size 4 is running on: node02 slave3 (rank 3, comm 1) of size 4 is running on: node03 ? ? ? ? ?X1 ? ? ? ? X2 ? ? ? ? ? X3 1 ?0.6422312 -1.4176550 -0.864957823 2 -0.9049865 ?1.3221402 ?0.322550244 3 ?1.1318463 -0.3170188 -0.001224240 4 ?0.8153995 ?1.4860591 -1.507712241 5 -0.1545055 ?0.3834336 -0.104543321 [1] 1 So everything seems OK, three slaves are running and therefore mpi.remote.exec(rnorm(5)) is executed on three slaves. BTW: if I leave out "mpi.close.Rslaves()" the program will produce the same output, but then hang indefinitely. After Hao's response, I removed the .Rprofile from Rmpi from the working directory. The sample script above wouldn't run anymore, but this is expected if I understood Hao's comment in the right way. What really strikes me, is that a test script I prepared for testing Stephen's doMPI package (this is my ultimate goal: foreach, iterators and doMPI) now ran smoothly: library(doMPI) cl <- startMPIcluster() registerDoMPI(cl) foreach(i = 1:3) %dopar% sqrt(i) closeCluster(cl) mpi.quit() /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 Rscript mpiTest_04.R Output: [[1]] [1] 1 [[2]] [1] 1.414214 [[3]] [1] 1.732051 So this seems to work allright. The same is true for the bootMPI.R example provided with doMPI. Both, however, don't work if I restore the Rmpi .Rprofile. I'm sorry to bother you with these newbie questions, but I really would like to understand what is going on here. Thanks a lot for your support and thanks to both of you Hao and Stephen for providing these great packages! Cheers, Sebastian "Hao Yu" <hyu at stats.uwo.ca> 18.01.2010 19:43 Bitte antworten an hyu at stats.uwo.ca An "Stephen Weston" <stephen.b.weston at gmail.com> Kopie sebastian.rohrer at basf.com, "R-sig-hpc at r-project.org" <r-sig-hpc at r-project.org> Thema Re: [R-sig-hpc] Plain: Problem with Rmpi mpi.spawn.Rslaves is not needed since orterun is used to create 1 master and 3 slaves. I assume Rprofile in Rmpi is used. Then all slaves are in infinite loop waiting for master instructions. They will not take any R scripts through BATCH mode (only master will process them). You may use mpi.remote.exec to get what you want or modify slave.hostinfo. Hao Stephen Weston wrote:
Why are you calling mpi.close.Rslaves? ?I believe that you should only do that if you've started the slaves via mpi.spawn.Rslaves. I'm not sure if that has any bearing on your problem, however. But I wouldn't draw too many conclusions based on not seeing messages to stdout from the slaves right before quitting, especially messages that are produced on different machines. To improve the chances of it working, I suggest that you add a call to flush(stdout()) after the cat. ?But you might want to do something else to prove whether the slaves are really running. - Steve On Mon, Jan 18, 2010 at 8:42 AM, ?<sebastian.rohrer at basf.com> wrote:
Dear List, I have the following problem with R (2.10.0) Rmpi using OpenMPI 1.3.3.: I use Dirk Eddelbuettels test script for testing if Rmpi works. (http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf, the example is on Slide 96) The script looks as follows: require(Rmpi) rk <- mpi.comm.rank(0) sz <- mpi.comm.size(0) name <- mpi.get.processor.name() cat("Hello, rank", rk, "size", sz, "on", name, "\n") mpi.close.Rslaves() mpi.quit() According to Dirk's slides, the output should look something like: Hello, rank 4 size 8 on ron Hello, rank 0 size 8 on ron Hello, rank 3 size 8 on mccoy Hello, rank 7 size 8 on mccoy Hello, rank Hello, rank 21 size 8 on joe size 8 on tony Hello, rank 6 size 8 on tony Hello, rank 5 size 8 on joe I call the script: /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 /programs/R/R-2.10.0/bin/Rscript mpiTest_03.R The output looks like: master (rank 0, comm 1) of size 4 is running on: node02 slave1 (rank 1, comm 1) of size 4 is running on: node03 slave2 (rank 2, comm 1) of size 4 is running on: node02 slave3 (rank 3, comm 1) of size 4 is running on: node03 Hello, rank 0 size 4 on node02 So, in my understanding there is a master and 3 slaves are spawned. But the call for the processor name is executed only on the master. Right? Any suggestions on what could be the problem? Thanks a lot! Sebastian ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Department of Statistics & Actuarial Sciences Fax Phone#:(519)-661-3813 The University of Western Ontario Office Phone#:(519)-661-3622 London, Ontario N6A 5B7 http://www.stats.uwo.ca/faculty/yu ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
On Tue, 2010-01-19 at 14:58 +0100, sebastian.rohrer at basf.com wrote:
But I must confess, I am a bit confused about the significance of the .Rprofile provided with Rmpi. I'm quite sure this is because I am a total noob concerning HPC but maybe you can help me understand the issue. After Stephen's response I changed my script to: require(Rmpi) print(mpi.remote.exec(rnorm(5))) mpi.close.Rslaves() mpi.quit() and called it with /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 Rscript mpiTest_03.R
You are not necessarily getting the Rmpi .Rprofile. To use it, you need to copy it to your startup directory, or link from there. Or you need to set an environment variable or use an R command line option. On Debian Lenny, r-cran-rmpi (thanks to Dirk for the packaging) has the file at /usr/lib/R/site-library/Rmpi/Rprofile As others have said, you don't necessarily want to use that profile; it depends on how you want to use Rmpi. Ross Boylan