mpirun and R

Steve:

It was built with OpenBLAS,
That is the problem - this was previously discussed here, see the archive. OpenBLAS changes the affinity of the process to use only one CPU. You have to reset the affinity either with Linux tools or using mcaffinity (available in R-devel).

Cheers,
Simon
but does that matter with an MPI-based
function (i.e. I thought GotoBLAS was an entirely different hpc aspect
that is only used for linear algebra routines) -- but yes, all the
spawned R processes end up spawning on a single cpu, but if I use
mpirun it functions properly.  I had to "roll" OpenBLAS myself on this
system, because it only has Intel MKL installed by the admins which I
have yet to get to play right with R.  OpenBLAS does work for LA
commands tho.

# In fact, running this on the normal spawned R uses all cores:
a = matrix(rnorm(5000*5000), 5000, 5000)
b = matrix(rnorm(5000*5000), 5000, 5000)
c = a%*%b

#But then in the same instance running:
require(raster)
beginCluster()
# Only spawns on one core.

Are there "better" parameters I might pass to snow to get this
working?  I get the same behavior in snowfall and sfInit():

require(snowfall)
sfInit(parallel=TRUE,cpus=12)
sfStop()
# All spawns execute on a single CPU
sfInit(parallel=TRUE,cpus=12,type="MPI")
sfStop()
# All spawns execute on a single CPU

Incidentally (and I don't consider this a perfectly satisfactory
answer, so please continue to give me some advice to try out), this
command at least lets me run R in interactive mode and doesn't always
bail when I type in an incorrect statement:

`which mpirun` -n 1 -machinefile $PBS_NODEFILE R --interactive
(note the --interactive instead of the --vanilla)

With that said if I need to kill a process with control-c (usually
just returning me to an R prompt) I do get R bailing back to the bash
command line.  The other reason I'd like a within-R solution to this
is that I do my development within the Stat-et/Eclipse environment,
and (at least right now) there is no way to modify how it launches R
remotely.

--j

On Fri, Jun 1, 2012 at 3:12 PM, Stephen Weston
<stephen.b.weston at gmail.com> wrote:
So you wanted 12 cpus on a single node, but the 12 spawned
R processes were all scheduled by your OS on a single cpu
rather than multiple cpus/cores on that node?

If so, that suggests that somehow the cpu affinity has been set.
We've seen this type of problem when using GotoBLAS2/OpenBLAS.
Has your R installation been built with either of them?

- Steve

On Fri, Jun 1, 2012 at 11:52 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
R-sig-hpc'ers:

Our system (running openmpi) allows for an interactive session to be
created with N number of CPUs allotted to it (12 in my case).  Here's
the qsub command to get the interactive node running:

qsub -X -I -q [mygroup] -l nodes=1:ppn=12,walltime=48:00:00

If I boot R and then try some HPC R commands e.g.:

require(raster)
# Note this is just a wrapper for a snow call:
beginCluster()

I get:
beginCluster()
Loading required package: snow
12 cores detected
cluster type: MPI
Loading required package: Rmpi
       12 slaves are spawned successfully. 0 failed.

If I "top" I see that I have 12 (13?) R spawns running.  The problem
is, they are all running on a SINGLE cpu, not distributed amongst all
12 cpus (even though it detected it).  My first question is: why is
this?  Is there a way to fix this from a standard "R" launch?

Now, I can SOMEWHAT fix this by:
`which mpirun` -n 1 -machinefile $PBS_NODEFILE R --vanilla

When I run the same commands, they distribute properly to all 12 cpus
BUT ANY error I make in typing will cause the entire system to "die":
require(raster)
require(raster)
Loading required package: raster
Loading required package: sp
raster 1.9-92 (1-May-2012)
beginCluster()
beginCluster()
Loading required package: snow
12 cores detected
cluster type: MPI
Loading required package: Rmpi
       12 slaves are spawned successfully. 0 failed.
abc
Error: object 'abc' not found
Execution halted
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 28932 on
node [mynode] exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Is there a way to allow me a "safer" mpirun launch that won't die if I
make a small typo?  This makes it REALLY hard to troubleshoot code if
any little error causes the quit.

--j

--
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

mpirun and R

Thread (16 messages)