Skip to content
Prev 858 / 2152 Next

parallel random numbers: set.seed(i), rsprng, rlecuyer, ??

I have a particular variant of this question, and also some additional
information related to the original one.  See also the discussion at
http://rwiki.sciviews.org/doku.php?id=packages:cran:rsprng.

The question is how to generate reproducible streams of parallel random
numbers in a way that is insensitive to the number of nodes used.  If I
can run 10 jobs one time, and 60 the next, I want to get the same random
numbers.

Scenarios
A) Each job generates its own random numbers.
B) A specialized subset of jobs generates random numbers; the subset
expands as the total number of jobs expands.
C) A fixed subset, e.g., 5 jobs, are responsible for generating the
random number.

C) seems the most likely to be achievable. rsprng, which we use,
initializes streams with a call that includes the stream number and the
total number of streams.  I don't know if the stream number alone
determines the sequence, or if the stream number, and possibly messaging
between processes, comes into play.  Even if rsprng came with some
guarantees, other parallel generators (e.g., rlecuyer) might not.

There are also calls for spawning streams and manipulating stream state.
Perhaps these could be jury-rigged into a solution.

Any thoughts?

There's one point in the original post I'd like to correct.  See below.
On Thu, 2010-06-17 at 11:29 +0200, Martin Maechler wrote:
My reading of the code is that rsprng, at least, does hook into the
underlying generators, which most of the r-level ones call.  It does not
hook into calls relating to setting the seed or saving and restoring the
stream state; i.e., those must be done through rsprng-specific calls.
It's probably not reasonable to expect that a single-stream seed setting
call would work with a parallel random number generator.

Ross

P.S. We're using rsprng because it is part of the base Debian
distribution.  Packaged versions of rlecuyer are available in other
repositories.