Hi Ben, What machines are listed when you execute: cat $PBS_NODEFILE in your batch script? Is it definitely four different nodes? - Steve On Mon, Apr 9, 2012 at 3:28 PM, Ben Weinstein
<bweinste at life.bio.sunysb.edu> wrote:
Hi Stephen,
I've tried to follow your answer, but i'm still getting the same results.
the heart of my qsub looks like:
mpirun -hostfile $PBS_NODEFILE -np 1 R --slave -f
/nfs/user08/bw4sz/Files/Seawulf.R
Before i run the foreach statement, i ask what node am i on?
[1] "Original Node wulfie121"
I make sure the open MPI library is there.
[1] "/usr/local/pkg/openmpi-1.4.4/lib/"
I make the cluster and ask how many slaves were spawn
4 slaves are spawned successfully. 0 failed.
Then i ask what are the nodenames of each of my slaves. I believe that if
this is working correctly, each of the nodenames should be different, since
i specified?#PBS -l nodes=4:ppn=1
However, all the slaves still spawn on that one node.
[[1]]
? ?nodename ? ? machine
"wulfie121" ? ?"x86_64"
[[2]]
? ?nodename ? ? machine
"wulfie121" ? ?"x86_64"
[[3]]
? ?nodename ? ? machine
"wulfie121" ? ?"x86_64"
[[4]]
? ?nodename ? ? machine
"wulfie121" ? ?"x86_64"
Finally, i'm testing how long the process takes to see if i'm actually
getting parrelization.
[1] 4
? ?user ?system elapsed
?17.650 ?39.990 159.632
Again, the heart of the code looks like
cl <- makeCluster(4, type = "MPI")
print(clusterCall(cl,function() Sys.info()[c("nodename","machine")]))
registerDoSNOW(cl)
print(getDoParWorkers())
system.time(five.ten <- rbind.fill(foreach(j=1:times ) %dopar%
drop.shuffle(j,iterations)))
stopCluster(cl)
I am about to change over to a different parralel backend as suggested, but
i doubt that is the root of the problem in this case.
I appreciate the continued help,
Ben Weinstein
On Thu, Mar 29, 2012 at 2:56 PM, Stephen Weston <stephen.b.weston at gmail.com>
wrote:
Hi Ben, You have to run R via mpirun, otherwise all of the workers start on the one node.
I have tried using mpirun -np 4 in front of the R - call, but this just fails without message.
You have to use '-np 1', otherwise your script will be executed by mpirun four times, each trying to spawn four workers. I'm not sure if that explains failing without a message, however. Try something like this: #!/bin/bash #PBS -o 'qsub.out' #PBS -e 'qsub.err' #PBS -l nodes=4:ppn=1 #PBS -m bea cat $PBS_NODEFILE hostname cd $PBS_O_WORKDIR # Run an R script mpirun -hostfile $PBS_NODEFILE -np 1 R --slave -f /nfs/user08/bw4sz/Files/Seawulf.R You may not need to use '-hostfile $PBS_NODEFILE', depending on how your Open MPI was built, but I don't think if ever hurts, and it may be required for your installation. - Steve
-- Ben Weinstein Graduate Student Ecology and Evolution Stony Brook University http://life.bio.sunysb.edu/~bweinste/index.html