Skip to content

stopCluster hangs instead of exits

2 messages · Bennet Fauber, Sajesh Singh

#
Sajesh,

I have to hang my head in some shame for not completely following the
whole trail of documentation.  Turned out that the answer was on Luke
Tierney's web site at

    http://homepage.divms.uiowa.edu/~luke/R/cluster/cluster.html

and I hadn't read the whole thing.  What is worse, it looks like it's
been there since at least 2016.  Many apologies to Prof Tierney.

We have been limping along using

    $ mpirun -np 1 R CMD BATCH mpi.R

and then inside the R script itself

    > library(Rmpi)
    > library(parallel)
    > library(snow)
    >
    > cl <- makeMPIcluster(N)

or similar, following on an example from long ago.

There is script in the `snow` installation directory, `RMPISNOW`, that
can be used, and it solves several problems at once.

Our cluster is running Slurm, I have OpenMPI versions 3.1.4 and 4.0.2
installed, along with R 3.6.1 and Rmpi-0.6-9, all compiled with GCC
8.2.0 on CentOS 7.

Adding the $R_LIBS_SITE/snow directory to the PATH provides `RMPISNOW`, and this

    mpirun RMPISNOW CMD BATCH /sw/examples/R/snow/snow-nuke.R

works beautifully with both versions of OpenMPI.

In case it is helpful to someone else, the script is as follows.

snow-nuke.R
-----------
# Example taken from the snow examples at
# http://homepage.divms.uiowa.edu/~luke/R/cluster/cluster.html

library(boot)
#  In this example we show the use of boot in a prediction from
#  regression based on the nuclear data.  This example is taken
#  from Example 6.8 of Davison and Hinkley (1997).  Notice also
#  that two extra arguments to statistic are passed through boot.
data(nuclear)
nuke <- nuclear[,c(1,2,5,7,8,10,11)]
nuke.lm <- glm(log(cost)~date+log(cap)+ne+ ct+log(cum.n)+pt, data=nuke)
nuke.diag <- glm.diag(nuke.lm)
nuke.res <- nuke.diag$res*nuke.diag$sd
nuke.res <- nuke.res-mean(nuke.res)

#  We set up a new dataframe with the data, the standardized
#  residuals and the fitted values for use in the bootstrap.
nuke.data <- data.frame(nuke,resid=nuke.res,fit=fitted(nuke.lm))

#  Now we want a prediction of plant number 32 but at date 73.00
new.data <- data.frame(cost=1, date=73.00, cap=886, ne=0,
                       ct=0, cum.n=11, pt=1)
new.fit <- predict(nuke.lm, new.data)

nuke.fun <- function(dat, inds, i.pred, fit.pred, x.pred) {
     assign(".inds", inds, envir=.GlobalEnv)
     lm.b <- glm(fit+resid[.inds] ~date+log(cap)+ne+ct+
                 log(cum.n)+pt, data=dat)
     pred.b <- predict(lm.b,x.pred)
     remove(".inds", envir=.GlobalEnv)
     c(coef(lm.b), pred.b-(fit.pred+dat$resid[i.pred]))
}

# Run this once on just the master process
system.time(nuke.boot <-
            boot(nuke.data, nuke.fun, R=999, m=1,
                 fit.pred=new.fit, x.pred=new.data))

# Run this once on all four workers
#### makeCluster() includes a check to see if one has been created, and
#### it attaches if one has
cl <- makeCluster()

clusterCall(cl, function () paste("I am on node ", Sys.info()[c("nodename")]))

#### Send instructions to the workers to load the boot library
clusterEvalQ(cl, library(boot))

#### Run this again using the cluster evaluation mechanism
system.time(cl.nuke.boot <-
            clusterCall(cl,boot,nuke.data, nuke.fun, R=500, m=1,
                        fit.pred=new.fit, x.pred=new.data))
-----------
On Sat, Nov 16, 2019 at 1:17 PM Bennet Fauber <bennet at umich.edu> wrote:
#
Happy to hear you were able to resolve.

Sajesh