Hi, In the meantime, I tried the Rmpi hello world as suggested by Mario. I tried exactly the one given on http://math.acadiau.ca/ACMMaC/Rmpi/sample.html I submitted it to the batch system via: bsub -n 4 -R "select[model==Opteron8380]" mpirun R --no-save -q -f Rmpi_hello_world.R Below is the output. Does anyone know how to interpret the error (and possible how to fix it :-) hopefully that helps solving the doSNOW problem)? Cheers, Marius ## ==== snippet start ==== Sender: LSF System <lsfadmin at a6211> Subject: Job 938942: <mpirun R --no-save -q -f Rmpi_hello_world.R> Done Job <mpirun R --no-save -q -f Rmpi_hello_world.R> was submitted from host <brutus2> by user <hofertj> in cluster <brutus>. Job was executed on host(s) <4*a6211>, in queue <pub.1h>, as user <hofertj> in cluster <brutus>. </cluster/home/math/hofertj> was used as the home directory. </cluster/home/math/hofertj> was used as the working directory. Started at Fri Dec 17 11:35:40 2010 Results reported at Fri Dec 17 11:35:49 2010 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input mpirun R --no-save -q -f Rmpi_hello_world.R ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 4.21 sec. Max Memory : 3 MB Max Swap : 29 MB Max Processes : 1 Max Threads : 1 The output (if any) follows: master (rank 0, comm 1) of size 4 is running on: a6211 slave1 (rank 1, comm 1) of size 4 is running on: a6211 slave2 (rank 2, comm 1) of size 4 is running on: a6211 slave3 (rank 3, comm 1) of size 4 is running on: a6211
## from http://math.acadiau.ca/ACMMaC/Rmpi/sample.html # Load the R MPI package if it is not already loaded. if (!is.loaded("mpi_initialize")) {
+ library("Rmpi")
+ }
# Spawn as many slaves as possible mpi.spawn.Rslaves()
Error in mpi.spawn.Rslaves() : It seems there are some slaves running on comm 1
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
+ if (is.loaded("mpi_initialize")){
+ if (mpi.comm.size(1) > 0){
+ print("Please use mpi.close.Rslaves() to close slaves.")
+ mpi.close.Rslaves()
+ }
+ print("Please use mpi.quit() to quit R")
+ .Call("mpi_finalize")
+ }
+ }
# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1 [1] "I am 1 of 4" $slave2 [1] "I am 2 of 4" $slave3 [1] "I am 3 of 4"
# Tell all slaves to close down, and exit the program mpi.close.Rslaves()
[1] 1
mpi.quit()
## ==== snippet end ====
On 2010-12-17, at 08:19 , Mario Valle wrote:
Good morning Marius!
I try to make parallels to my experience with snofall.
One was the number of jobs, the other hanging R processes around the cluster, another due to
mpich2 configuration on the cluster (does not allow MPI spawn processes).
Have you tried the Rmpi examples alone?
Have you tried the classical MPI "Hello world" application, just to exclude problems with MPI alone.
Unfortunately I will be back in my office Monday. I tried your example remotely but seems something changed to MPI
and now it hangs in makeCluster.
I'll try again monday.
Ciao!
mario
On 17-Dec-10 07:41, Marius Hofert wrote:
Dear Mario, I tried "-n 4" and obtain the same error :-( Cheers, Marius
-- Ing. Mario Valle Data Analysis and Visualization Group | http://www.cscs.ch/~mvalle Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60 v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82