hello
Because openmpi gets information from torque, mpirun is necessary.
When there is not a process in subordinates of mpirun, MPI_Comm_spawn
starts a process in ssh or rsh. its process not to know of torque...
please check...
can you see libtorque?
c.f.
$ ldd /usr/lib/openmpi/lib/openmpi/mca_plm_tm.so |grep libtorque
libtorque.so.2 => /usr/lib/libtorque.so.2 (0x00007fd68e189000)
<snip>
#!/bin/bash
#PBS -N R_test
#PBS -l
nodes=laicbio:ppn=32+laicbio1:ppn=12+laicbio2:ppn=12+laicbio3:ppn=12+la$
cd $PBS_O_WORKDIR
Rscript --no-save test.R
c.f.
mpirun -np 1 Rscript --no-save test.R
Only a master process starts, with option `-np 1'
<snip>
<snip>
mpi.spawn.Rslaves(nslaves=mpi.universe.size()-1)
need to reduce the number of processes for master.
It's giving me the following errors:
---
$ cat R_test.e98
[laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: Not found in file
routed_binomial.c at line 386
[laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: A message is attempting to
sent to a process whose contact information is unknown in file
rml_oob_send.c at line 104
[laicbio:67788] [[32125,0],0] could not get route to [[32125,2],0]
---
And the following output:
---
$ cat R_test.o98
1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: laicbio
slave1 (rank 1, comm 1) of size 2 is running on: laicbio
$slave1
[1] "I am 1 of 2"
[1] 1
---
If I add mpiexec before Rscript to the PBS script, the job keeps running
(doesn't finish) and I get lots of empty logs named like
laicbio3.9740+1.10076.log, laicbio3 is one of the working nodes.
May you suggest me a way for testing to track the problem down?
Thanks again.
Alejandro
2014-11-08 10:59 GMT-06:00 Dirk Eddelbuettel <edd at debian.org>:
On 6 November 2014 at 12:21, Alejandro Gonzalez wrote:
| Hello List, this is my first message but I've been using your help
| while, thank you.
|
| I have a cluster of Ubuntu 14.04 machines with OpenMPI and I'm not
| able to install Rmpi.
What happens when you try
sudo apt-get install r-cran-rmpi
as in most cases the pre-built binary will be just fine.
| Here are some more specs of my system:
| - I installed from sources Torque 4.2.9 and Maui 3.3.1
| - OpenMPI version is 1.8.2 (I installed this one from source too)
| - R version is 3.0.2 (This was installed with apt-get install)
|
| When I try to install Rmpi:
| $ sudo R CMD INSTALL Rmpi_0.6-3.tar.gz
| --configure-args="--with-mpi=/opt/openmpi"
|
| I get the following:
| ---
| * installing to library '/usr/local/lib/R/site-library'
| * installing *source* package 'Rmpi' ...
| checking for gcc... gcc -std=gnu99
| checking whether the C compiler works... yes
| checking for C compiler default output file name... a.out
| checking for suffix of executables...
| checking whether we are cross compiling... no
| checking for suffix of object files... o
| checking whether we are using the GNU C compiler... yes
| checking whether gcc -std=gnu99 accepts -g... yes
| checking for gcc -std=gnu99 option to accept ISO C89... none needed
| Trying to find mpi.h ...
| Found in /opt/openmpi/include
| Trying to find libmpi.so or libmpich.a ...
| Found libmpi in /opt/openmpi/lib
| checking for orted... no
| configure: error: Cannot find orted. Rmpi needs orted to run.
Given that we have an existing Debian (and Ubuntu) package which has
built for years, "all" you need to do is to ensure that you too have
is
called the 'Build-Depends' needed to build the package. Each Debian
package
writes these down in their configuration, and here it is (and I wrapped
lines
for the email)
Build-Depends: debhelper (>= 7.0.0), cdbs, \
r-base-dev (>= 3.1.0), \
mpi-default-dev, mpi-default-bin
where line one just deals with Debian packaging internals, line two
R
is present (doh !!) and line three ensures that you have both the
and headers / libraries for the default MPI implementation on your
architecture -- which is OpenMPI on most of them (and MPICH on some less
common architectures).
I do not think this has anything to do with Torque (though I could be
overlooking something, Ei-ji usually knows very very well what he is
talking
about).
But as I said: there is generally no reason to build this from source.
Dirk
| ERROR: configuration failed for package 'Rmpi'
| * removing '/usr/local/lib/R/site-library/Rmpi'
| ---
|
| I've read the Rmpi news,
|
| then tried to install Rmpi using a new build of OpenMPI, that I
configured
| this way:
| $ ./configure --with-tm=/opt/torque
--prefix=/opt/openmpi_disable_dlopen
| --disable-dlopen
| But I got the same error (configure: error: Cannot find orted. Rmpi
| orted to run.).
|
| Am I doing something wrong? Do you have a clue on how can I install
| I'd also want to understand more about what does --disable-dlopen
why
| it's necessary for Rmpi and what happens when I run other MPI software
when
| OpenMPI has been configured with --disable-dlopen. May you share me
[[alternative HTML version deleted]]