mpirun and R
So you wanted 12 cpus on a single node, but the 12 spawned R processes were all scheduled by your OS on a single cpu rather than multiple cpus/cores on that node? If so, that suggests that somehow the cpu affinity has been set. We've seen this type of problem when using GotoBLAS2/OpenBLAS. Has your R installation been built with either of them? - Steve
On Fri, Jun 1, 2012 at 11:52 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
R-sig-hpc'ers: Our system (running openmpi) allows for an interactive session to be created with N number of CPUs allotted to it (12 in my case). ?Here's the qsub command to get the interactive node running: qsub -X -I -q [mygroup] -l nodes=1:ppn=12,walltime=48:00:00 If I boot R and then try some HPC R commands e.g.: require(raster) # Note this is just a wrapper for a snow call: beginCluster() I get:
beginCluster()
Loading required package: snow 12 cores detected cluster type: MPI Loading required package: Rmpi ? ? ? ?12 slaves are spawned successfully. 0 failed. If I "top" I see that I have 12 (13?) R spawns running. ?The problem is, they are all running on a SINGLE cpu, not distributed amongst all 12 cpus (even though it detected it). ?My first question is: why is this? ?Is there a way to fix this from a standard "R" launch? Now, I can SOMEWHAT fix this by: `which mpirun` -n 1 -machinefile $PBS_NODEFILE R --vanilla When I run the same commands, they distribute properly to all 12 cpus BUT ANY error I make in typing will cause the entire system to "die":
require(raster)
require(raster) Loading required package: raster Loading required package: sp raster 1.9-92 (1-May-2012)
beginCluster()
beginCluster() Loading required package: snow 12 cores detected cluster type: MPI Loading required package: Rmpi ? ? ? ?12 slaves are spawned successfully. 0 failed.
abc
Error: object 'abc' not found Execution halted -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 28932 on node [mynode] exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- Is there a way to allow me a "safer" mpirun launch that won't die if I make a small typo? ?This makes it REALLY hard to troubleshoot code if any little error causes the quit. --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc