difficulty spawning Rslaves
Hummm... Already beyond my very little understanding of LAM/MPI. I guess that booting the LAM universe (with at least, say, 2 slaves, i.e., setting that in the config file) and then starting R and doing, directly library(Rmpi) mpi.spawn.Rslaves(nslaves = 2) also fails? Best, R.
On Wed, Dec 30, 2009 at 5:33 PM, Allan Strand <stranda at cofc.edu> wrote:
Hi Ramon, Still having the problem. Lam definitely seems to be working (step 2 below succeeds). ?I've also recompiled/reinstalled Rmpi (several to many times). ?As for mpi.comm.free usage, the truth is that when things are working I only use the snow interface, so I really have little facility with Rmpi. ?Will use mpi.close.Rslaves() now. This error seems to be the problem, I stepped manually through mpi.spawn.Rslaves() and everything succeeds until the call to mpi.intercomm.merge Error in mpi.intercomm.merge(intercomm, 0, comm) : | ? ?MPI_Error_string: process in local group is dead Still looking, cheers, a. On 12/30/2009 06:02 AM, Ramon Diaz-Uriarte wrote:
This might not provide any useful info, but just in case: when running Rmpi, a bunch of log files are temporarily created in the current working directory. Sometimes, they contain a little bit more info than "process in local group is dead". And a couple of other checks: 1. After installing release 7.1.2, you of course recompiled Rmpi against the new versions? 2. Before running R, do your usual lamboot routine and then: 2.1. lamexec C hostname 2.2 tping C N -c 2 (or anyother number after -c) 3. Inisde Rmpi, why do you use mpi.comm.free instead of just mpi.close.Rslaves? For me, for instance, the following works reliably: library(Rmpi) mpi.spawn.Rslaves(nslaves = 1) mpi.close.Rslaves() mpi.spawn.Rslaves(nslaves = 4) Best, R. On Tue, Dec 29, 2009 at 3:37 PM, Allan Strand<stranda at cofc.edu> ?wrote:
Thanks Dirk and Ramon. I tried Lam 7.1.2 and am still seeing the same type of behavior. ?Still searching for a solution, and will report back. cheers, a. On 12/28/2009 12:28 PM, Ramon Diaz-Uriarte wrote:
More along Dirk's comments: we currently have two clusters using LAM, both Debian systems, one using v. 7.1.2 of LAM's release and the other 7.1.1. In a current Ubuntu-based laptop, things are working with release 7.1.2. Best, R. On Mon, Dec 28, 2009 at 5:14 PM, Dirk Eddelbuettel<edd at debian.org> ?wrote:
Allan, On 23 December 2009 at 16:05, Allan Strand wrote: | My setup is on a cluster running 64bit FC. ?I have recently broken my | install Rmpi (and hence snow) by upgrading some very old versions of R, | lam/mpi, Rmpi, and snow (currently installed versions listed at the | bottom of this email). ?No doubt this is a problem with my Rmpi install, | but I'm having trouble seeing it. | | I cannot seem to spawn more than a single slave (which is spawned on the | master node) | e.g.: | |> ? ?mpi.spawn.Rslaves(comm=1,nslaves=1) | ? ? ?1 slaves are spawned successfully. 0 failed. | master (rank 0, comm 1) of size 2 is running on: node0 | slave1 (rank 1, comm 1) of size 2 is running on: node0 | |> ? ?mpi.comm.free(comm=1) | [1] 1 | |> ? ?mpi.spawn.Rslaves(comm=1,nslaves=2) | ? ? ?2 slaves are spawned successfully. 0 failed. | Error in mpi.intercomm.merge(intercomm, 0, comm) : | ? ?MPI_Error_string: process in local group is dead | | No doubt the answer is contained in the MPI_Error string, but I'm not | sure how to interpret it. | | Thanks, | Allan | =================================== | Versions (all installed locally in my account with directory appropriate | ./configure settings) | | R 2.10.1 | LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University ?^^^^^^^^^^^^^^^^^^^^^^^^^ For what it is worth, a looong time ago (two years? longer?) when I was helping Manual to get the Debian OpenMPI packages into and when I was transitioning off LAM, I had concluded that the very latest 7.1.X releases of LAM were broken for me. ?The system was a then-current Ubuntu system with the LAM and OpenMPI packages compiled from Debian sources. ?Provided I 'frozen' LAM at 7.1.2 things would work, the newer ones would not. So I'd recommend either downgrading to the last LAM that worked for you, or rather take the plunge and jump to Open MPI. The 1.3.* series is pretty already, and 1.4.0 is just around the corner. Just my $0.02. The problem may of course be entirely different. Dirk -- Three out of two people have difficulties with fractions.
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Allan Strand, ? Biology ? ?http://linum.cofc.edu College of Charleston ? ? ?Ph. (843) 953-9189 Charleston, SC 29424 ? ? ? Fax (843) 953-9199
-- Allan Strand, ? Biology ? ?http://linum.cofc.edu College of Charleston ? ? ?Ph. (843) 953-9189 Charleston, SC 29424 ? ? ? Fax (843) 953-9199
Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019