Rmpi and cpu usage on slaves
As Dirk said, it is a feature of OpenMPI. LAM-MPI doesn't have this issue. I don't think there is a solution on slave sides since mpi.bcast is a blocking call. It might be possible to use nonblocking point-to-point calls such as mpi.ireiv with Sys.sleep command but the whole-slave communications must be rewritten. If Dirk is correct, future release of openmpi will remove such a feature. This is why I did not try to work out a solution, at least on slave sides. In real computation, all slaves are supposed to use up all assigned cpu cycles. The same issue will be applied to master as well if any of parallel apply functions are used. In Rmpi 0.4-7 several nonblock parallel apply functions are added so master will not consume 100%cpu while waiting. So far LAM-MPI is still the best environment for programing, debugging and testing. Hao
Dirk Eddelbuettel wrote:
On 21 April 2009 at 16:40, Sean Davis wrote: | I am running sge6.2, openmpi 1.3.1, and Rmpi 0.5.7 on openSUSE linux. I can | start up an arbitrarily-sized cluster using sge, see the appropriate | universe.size using Rmpi, and start a cluster using mpi.spawn.Rslaves(). | However, it appears that all the slaves then run at 100% cpu on all nodes. | Even using Rmpi under openmpi with a simple hostfile produces the same | result. Any suggestions to figure out what is going on on the slaves? There is a known issue with Open MPI and blocking which you may be hitting here. Upstream Open MPI considers it a feature. But as this has come up a few times on their mailing list as well, I believe the last word was that it will go away in a future release. Hth, Dirk | Thanks, | Sean | | | > library(Rmpi) | library(Rmpi) | > mpi.universe.size() | mpi.universe.size() | [1] 24 | > mpi.spawn.Rslaves() | mpi.spawn.Rslaves() | 24 slaves are spawned successfully. 0 failed. | master (rank 0 , comm 1) of size 25 is running on: Mahfouz | slave1 (rank 1 , comm 1) of size 25 is running on: Mahfouz | slave2 (rank 2 , comm 1) of size 25 is running on: Mahfouz | slave3 (rank 3 , comm 1) of size 25 is running on: Mahfouz | slave4 (rank 4 , comm 1) of size 25 is running on: Mahfouz | slave5 (rank 5 , comm 1) of size 25 is running on: Mahfouz | slave6 (rank 6 , comm 1) of size 25 is running on: Mahfouz | slave7 (rank 7 , comm 1) of size 25 is running on: Mahfouz | slave8 (rank 8 , comm 1) of size 25 is running on: Grass | slave9 (rank 9 , comm 1) of size 25 is running on: Grass | slave10 (rank 10, comm 1) of size 25 is running on: Grass | slave11 (rank 11, comm 1) of size 25 is running on: Grass | slave12 (rank 12, comm 1) of size 25 is running on: Grass | slave13 (rank 13, comm 1) of size 25 is running on: Grass | slave14 (rank 14, comm 1) of size 25 is running on: Grass | slave15 (rank 15, comm 1) of size 25 is running on: Grass | slave16 (rank 16, comm 1) of size 25 is running on: shakespeare | slave17 (rank 17, comm 1) of size 25 is running on: shakespeare | slave18 (rank 18, comm 1) of size 25 is running on: shakespeare | slave19 (rank 19, comm 1) of size 25 is running on: shakespeare | slave20 (rank 20, comm 1) of size 25 is running on: shakespeare | slave21 (rank 21, comm 1) of size 25 is running on: shakespeare | slave22 (rank 22, comm 1) of size 25 is running on: shakespeare | slave23 (rank 23, comm 1) of size 25 is running on: shakespeare | slave24 (rank 24, comm 1) of size 25 is running on: Mahfouz | > mpi.close.Rslaves() | mpi.close.Rslaves() | [1] 1 | | > sessionInfo() # on the master | R version 2.9.0 Under development (unstable) (2009-02-21 r47969) | x86_64-unknown-linux-gnu | | locale: | LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] Rmpi_0.5-7 | | [[alternative HTML version deleted]] | | _______________________________________________ | R-sig-hpc mailing list | R-sig-hpc at r-project.org | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- Three out of two people have difficulties with fractions.
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Department of Statistics & Actuarial Sciences Fax Phone#:(519)-661-3813 The University of Western Ontario Office Phone#:(519)-661-3622 London, Ontario N6A 5B7 http://www.stats.uwo.ca/faculty/yu