Quick rsprng questions

Thu, Jul 30, 2009 2:20 PM

I'm far from an authority, but I'll try to answer.

On Thu, 2009-07-30 at 16:35 -0400, Thomas Hampton wrote:

I would not expect the behavior you described above.  The only
misbehavior that seems likely is that each node/process in the cluster
gets the same, or at least not fully independent, streams of random
numbers.

First, you need to initialize rsprng properly and second you need to be
able to access it.

Unless something else initializes rsprng (e.g., snow provides
setupSPRNG), you need to by calling init.rsprng with appropriate
parameters (which include the total number of processes and the rank of
the process executing the initialization).  This will create independent
streams.

The second issue is getting access to these random numbers.  The uniform
random number generator and anything derived from it should work.  I'm
not sure if the normal random number generator will use SPRNG or not; I
suspect it will.

If you're trying to access the random number stream from C code, it's
tricky.  There are more details on the web page I announced:
http://wiki.r-project.org/rwiki/doku.php?id=packages:cran:rsprng.

In the simplest case, you might get the same random number stream in
each parallel process.  This means the extra runs are pointless and, if
you use them naively, you will think you have a much bigger sample than
you really do.

A more complex problem is that the random number streams could be
dependent, but in a more subtle way.

A simple strategy is to generate a list of random integers to serve as
seeds, ship a different seed to each process, and then set the seed in
each process.  This works with non-parallel RNG's and is probably good
enough in most cases (it's a popular move in the biostat dept here).  I
suspect there are some issues with it, though, because otherwise there'd
be no need to for explicit parallel random number generators like SPRNG.

Ross

Quick rsprng questions

Thread (3 messages)