time to process

I think your primary problem is that the tasks are individually
quite small.  The master process is probably the bottleneck,
spending all of it's time sending out tasks and retrieving
the results.  But since you have a lot of tasks, you can group
them together so they can be executed more efficiently.

The doMPI package has a backend-specific option, named
"chunkSIze" that can do that kind grouping, or chunking,
automatically.  You specify chunkSize using a list that you
pass to foreach via the ".options.mpi" argument.  Here's
an example:

suppressMessages(library(doMPI))

# Create and register an MPI cluster
cl <- startMPIcluster()
registerDoMPI(cl)

# Initialize variables
n <- 10000
opts <- list(chunkSize=100)

# Perform simulations in parallel
r <- foreach(1:n, .combine='c', .options.mpi=opts) %dopar% {
  x <- ts(arima.sim(list(order=c(1,0,0), ar=-0.9), n=360), start=1975, freq=12)
  y <- aggregate(x, nfreq=4, sum)
  arima(y, order=c(1,0,0))$model$phi
}

# Print a summary of the resulting vector
print(summary(r))

# Shutdown the cluster and quit
closeCluster(cl)
mpi.quit()

I picked a chunkSize of 100 because that should increase the
amount of work by a factor of 100, while still keeping 100
tasks.  That shouldn't work out too badly even with 58 workers.

On my two core machine, this runs in about 100 seconds.

But I'm curious about how you're running the program, and what kind
of computers/network you're running on.  There might be something
else going wrong to explain such bad performance compared to
what I'm seeing.

As for user vs. elapsed time, they both have useful information,
but most of the time you only really care about elapsed time,
since that is what ultimately matters to most people.

- Steve

time to process

Thread (2 messages)