Skip to content
Prev 1827 / 2152 Next

Snow: ClusterApply fails 2nd parallelization on NSF Stampede

By the way, here are the basic commands I've been calling in my loop
(which work fine with small data sets, but not with larger ones). I've
also added variables earlier to store RNG seeds and log files for each
loop, so each loop is logged and uses a replicable RNG sequence, and
each loop output is saved as a unique object. One difference with yours
(I think?) is that I'm stopping the cluster each replicate and then
re-setting it up. (When I didn't do that earlier, I had memory issues
when exporting certain objects across the cluster.

# Loop: Subset large data set into smaller batches
for (b in 1:6) {
     data <- all.data[Bseq.start[b]:Bseq.end[b],]    # Subset data
     nr <- nrow(data)
     set.seed(BRNG.seeds[b])     # Seed the RNG

     # Set up computer cluster
     cpus <- 382                # Number of CPUs to cluster together
     sfSetMaxCPUs(cpus)            # Use if plan more than 32 CPUs
     sfInit(parallel=T, cpus=cpus, slaveOutfile=paste("initfile",
LETTERS[b], sep=""), type="MPI") # Initialize cluster
     stopifnot(sfCpus()==cpus)        # Confirm set up CPUs properly
     stopifnot(sfParallel()==TRUE)    # Confirm now running in parallel
     sfExport(list=c("data", "fit.models")) # Export necessary objects,
variables, and functions
     sfLibrary(MASS)            # Export libraries

     # Calculate model fits across cluster and stop cluster
     sfClusterSetupRNG(seed=RNG.seeds[p])    # Ensures repeatability
     nsam <- seq(nr)
     assign(files[p], sfClusterApplyLB(x=nsam, fun=fit.models,
data=data, ppcc.nsim=ppcc.nsim, ks.nsim=ks.nsim))     # Load-balanced
version
     temp <- get(files[p])
     save(temp, file=files[p])
     sfStop()
     }
On 2/4/2014 11:54 AM, Novack-Gottshall, Philip M. wrote: