portable parallel seeds project: request for critiques
On Fri, Feb 17, 2012 at 02:57:26PM -0600, Paul Johnson wrote:
I've got another edition of my simulation replication framework. I'm attaching 2 R files and pasting in the readme. I would especially like to know if I'm doing anything that breaks .Random.seed or other things that R's parallel uses in the environment. In case you don't want to wrestle with attachments, the same files are online in our SVN http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/
Hi. In the description of your project in the file http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/README you argue as follows Question: Why is this better than the simple old approach of setting the seeds within each run with a formula like set.seed(2345 + 10 * run) Answer: That does allow replication, but it does not assure that each run uses non-overlapping random number streams. It offers absolutely no assurance whatsoever that the runs are actually non-redundant. The following demonstrates that the function set.seed() for the default generator indeed allows to have correlated streams. step <- function(x) { x[x < 0] <- x[x < 0] + 2^32 x <- (69069 * x + 1) %% 2^32 x[x > 2^31] <- x[x > 2^31] - 2^32 x } n <- 1000 seed1 <- 124370417 # any seed seed2 <- step(seed1) set.seed(seed1) x <- runif(n) set.seed(seed2) y <- runif(n) rbind(seed1, seed2) table(x[-1] == y[-n]) The output is [,1] seed1 124370417 seed2 205739774 FALSE TRUE 5 994 This means that if the streams x, y are generated from the two seeds above, then y is almost exactly equal to x shifted by 1. What is the current state of your project? Petr Savicky.