mpirun and R
On Jun 2, 2012, at 12:46 PM, Jonathan Greenberg wrote:
Steve: It was built with OpenBLAS,
That is the problem - this was previously discussed here, see the archive. OpenBLAS changes the affinity of the process to use only one CPU. You have to reset the affinity either with Linux tools or using mcaffinity (available in R-devel). Cheers, Simon
but does that matter with an MPI-based function (i.e. I thought GotoBLAS was an entirely different hpc aspect that is only used for linear algebra routines) -- but yes, all the spawned R processes end up spawning on a single cpu, but if I use mpirun it functions properly. I had to "roll" OpenBLAS myself on this system, because it only has Intel MKL installed by the admins which I have yet to get to play right with R. OpenBLAS does work for LA commands tho. # In fact, running this on the normal spawned R uses all cores: a = matrix(rnorm(5000*5000), 5000, 5000) b = matrix(rnorm(5000*5000), 5000, 5000) c = a%*%b #But then in the same instance running: require(raster) beginCluster() # Only spawns on one core. Are there "better" parameters I might pass to snow to get this working? I get the same behavior in snowfall and sfInit(): require(snowfall) sfInit(parallel=TRUE,cpus=12) sfStop() # All spawns execute on a single CPU sfInit(parallel=TRUE,cpus=12,type="MPI") sfStop() # All spawns execute on a single CPU Incidentally (and I don't consider this a perfectly satisfactory answer, so please continue to give me some advice to try out), this command at least lets me run R in interactive mode and doesn't always bail when I type in an incorrect statement: `which mpirun` -n 1 -machinefile $PBS_NODEFILE R --interactive (note the --interactive instead of the --vanilla) With that said if I need to kill a process with control-c (usually just returning me to an R prompt) I do get R bailing back to the bash command line. The other reason I'd like a within-R solution to this is that I do my development within the Stat-et/Eclipse environment, and (at least right now) there is no way to modify how it launches R remotely. --j On Fri, Jun 1, 2012 at 3:12 PM, Stephen Weston <stephen.b.weston at gmail.com> wrote:
So you wanted 12 cpus on a single node, but the 12 spawned R processes were all scheduled by your OS on a single cpu rather than multiple cpus/cores on that node? If so, that suggests that somehow the cpu affinity has been set. We've seen this type of problem when using GotoBLAS2/OpenBLAS. Has your R installation been built with either of them? - Steve On Fri, Jun 1, 2012 at 11:52 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
R-sig-hpc'ers: Our system (running openmpi) allows for an interactive session to be created with N number of CPUs allotted to it (12 in my case). Here's the qsub command to get the interactive node running: qsub -X -I -q [mygroup] -l nodes=1:ppn=12,walltime=48:00:00 If I boot R and then try some HPC R commands e.g.: require(raster) # Note this is just a wrapper for a snow call: beginCluster() I get:
beginCluster()
Loading required package: snow
12 cores detected
cluster type: MPI
Loading required package: Rmpi
12 slaves are spawned successfully. 0 failed.
If I "top" I see that I have 12 (13?) R spawns running. The problem
is, they are all running on a SINGLE cpu, not distributed amongst all
12 cpus (even though it detected it). My first question is: why is
this? Is there a way to fix this from a standard "R" launch?
Now, I can SOMEWHAT fix this by:
`which mpirun` -n 1 -machinefile $PBS_NODEFILE R --vanilla
When I run the same commands, they distribute properly to all 12 cpus
BUT ANY error I make in typing will cause the entire system to "die":
require(raster)
require(raster) Loading required package: raster Loading required package: sp raster 1.9-92 (1-May-2012)
beginCluster()
beginCluster()
Loading required package: snow
12 cores detected
cluster type: MPI
Loading required package: Rmpi
12 slaves are spawned successfully. 0 failed.
abc
Error: object 'abc' not found Execution halted -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 28932 on node [mynode] exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- Is there a way to allow me a "safer" mpirun launch that won't die if I make a small typo? This makes it REALLY hard to troubleshoot code if any little error causes the quit. --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc