The foreach, iterators and doMC packages

Mark,

I've been playing with your example a bit more, and thought
that I should mention that splitting up matrices (or data frames)
as you're doing can be a bit tricky.  Mostly it has to do with
handling special cases properly, like when the number of rows
is less than the number of cores, and other things that don't
happen often, but are confusing when they do.

My plan has been to provide tools in the iterators package to
help with those kinds of tasks.  There isn't a lot yet, but the
idiv function can actually be used to help provide a different
solution to your problem.  I'm using the idiv function to help
implement a function that returns a "block row" iterator which
works on both matrices and data frames:

iblkrow <- function(a, ...) {
  i <- 1
  it <- idiv(nrow(a), ...)

  nextEl <- function() {
    n <- nextElem(it)
    r <- seq(i, length=n)
    i <<- i + n
    a[r,, drop=FALSE]
  }

  obj <- list(nextElem=nextEl)
  class(obj) <- c('abstractiter', 'iter')
  obj
}

This can be used with foreach for your example as follows:

foreach(x=iblkrow(x.mat, chunks=Ncore), y=iblkrow(y.mat, chunks=Ncore),
              .combine='rbind') %dopar%
    do.par.test.called.func(x, y)

This creates two iterators, one for each matrix, and completely handles
the indexing in the iterator itself.  This approach is also more efficient
if you're using a distributed memory parallel backend, because less
data is being sent over the network, but that isn't an issue when using
doMC.

Also note that idiv takes either a "chunks" or "chunkSize" argument,
which is passed along to iblkrow.  That allows you to specify either
the number of pieces to split the matrix into (using chunks, as in
this example), or the number of rows per task (using chunkSize,
which is sometimes useful).

I hope this doesn't seem overwhelming.  You don't have to use
iterators with foreach, but they are powerful and can be very helpful
at times. You might also want to take a look at David Smith's blog,
which is on iterators today.

- Steve

The foreach, iterators and doMC packages

Thread (7 messages)