Snow Not Distributing
On Fri, Jan 20, 2012 at 3:53 PM, Jeff Allen <lists at jdadesign.net> wrote:
I have been able to successfully setup snow (0.3-5) and Rmpi (0.5-9) on my RedHat 5 cluster, and have it working perfectly for jobs that don't span multiple nodes. We're using Torque for resource management, so I start a job with access to multiple nodes and load Snow. Unfortunately, not matter what size cluster I try to make, all of the workers end up running on the same host -- leaving the other hosts idle.
Have you solved the problem yet? If not, I can help. I have exactly your setup and I have been through EXACTLY the same problems you are seeing. I've been developing a collection of Rmpi programs that actually work, some with Snow, some with parallel. This is the cluster main page http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main and about 2/3 down, you see a link to my collection of working programs. That is an SVN repo that has http access http://winstat.quant.ku.edu/svn/hpcexample/trunk In case you are impatient, here is what I suggest. This should be your submission script. I mean this works for us. #!/bin/sh # #This is an example script example.sh # #These commands set up the Grid Environment for your job: #PBS -N SnowHelloWorld #PBS -l nodes=11:ppn=1 #PBS -l walltime=00:50:00 #PBS -M pauljohn at ku.edu #PBS -m bea cd $PBS_O_WORKDIR ### This RUNS, and because I give it a machine list, it uses them. orterun --hostfile $PBS_NODEFILE -n 1 R --no-save --vanilla -f snow-hello.R ############################### note that in the orterun command (same as mpirun) I am ONLY REQUESTING one node. We let R do the spawning of the jobs. THe PBS command asks for 11 nodes Then the job for snow-hello.R creates the cluster. Why am I pasting this in. I'm crazy. Just go look here for the sub script, the program, an explanation, and example output. http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex60-HelloWorldSnow/
I'm no expert with MPI or snow, so I'm really not sure how to approach debugging this. Any input would be much appreciated! Jeff
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas