Skip to content

[R-pkg-devel] Save and restoring random number seed in a package function

6 messages · Noah Greifer, James Pustejovsky, Henrik Bengtsson +1 more

#
Hello fellow developers,

I am attempting to solve the problem of saving the state of the random
generator so that the state can be recovered in a future call. Essentially,
my function generates random numbers, performs an operation on them (saving
the result), and throws them out (saving them would require too much
memory). A second function is meant to take the output of the first
function, generate the same random numbers, and perform a different
operation on them.

This is exactly what happens in the *boot* package: the boot() function
saves the random seed (extracted from .Random.Seed), and the boot.array()
function extracts the saved seed from the boot() output, sets the seed to
that value, re-generates the same set of random numbers, and then
re-sets the seed to what it was before boot.array() was called. This has
the following benefits: 1) it allows the same random numbers to be drawn;
2) the random numbers don't need to be saved, which is good because they
would take up a lot of memory and boot.array() is an optional function (it
is used in boot.ci() with type = "bca" for those curious); and 3) the seed
carries on from where it left off before boot.array() was called instead of
being set to what it was after boot() was called.

This is implemented in boot in the following way (code abbreviated):

boot <- function(...) {
  seed <- .Random.Seed
  #Random numbers generated
  out <- list(seed = seed
                  #Other stuff is in this list
              )
  out
}

boot.array <- function(boot.out) {
  #Save current random seed in `temp`
  if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
    temp <- get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
  else temp <- NULL

  #Assign saved seed from boot.out
  assign(".Random.seed", boot.out$seed, envir = .GlobalEnv)

  #Generate same random numbers from boot() call

  #Restore random seed to what it was before boot.array() call
  if (!is.null(temp))
    assign(".Random.seed", temp, envir = .GlobalEnv)
  else rm(.Random.seed, pos = 1)
}

This seems to work as intended. However, this violates the CRAN policy of
changing the global environment. When I used this exact code in a package I
submitted, the package was rejected for it. The message I received was
I'm curious what you think the best course of action might be, and what the
current policy means for the *boot* package. Thanks for your help.

Noah
#
I'm interested in this question too. Noah, is there a reason you are using
assign(".Random.seed",...) rather than set.seed()?
On Wed, Sep 14, 2022 at 9:31 AM Noah Greifer <noah.greifer at gmail.com> wrote:

            

  
  
#
Yes, set.seed() cannot accept .Random.Seed as an input; it can only take a
single integer. As said in this answer
<https://stackoverflow.com/a/13997608/6348551>, there is a one-way
relationship between set.seed() and .Random.Seed. My understanding is that
the recommended way to restore the seed is to assign the saved seed to
.Random.Seed in the global environment, though this is the method that is
not allowed by the CRAN policy. Unfortunately saving it in the environment
of the inner function is not sufficient.

One potential inconsistency with CRAN's policy is that generating a random
number itself changes the global environment by changing the value of
.Random.Seed. The boot.array() code just does it manually using assign().
Indeed, the boot.array() code does less damage to the global environment in
that it resets the seed to what it would have been had boot.array() not
been run.

Noah

On Wed, Sep 14, 2022 at 10:39 AM James Pustejovsky <jepusto at gmail.com>
wrote:

  
  
#
Great to see you're handling the corner case when .Random.seed is not
set.  Note that your current implementation of boot.array() still
risks leaving .Random.seed modified in case an error or an interrupt
occurs while evaluating boot.array().  If that happens, one certainly
can argue that it modifies the .GlobalEnv environment.  A more robust
solution is:

boot.array <- function(boot.out) {
  genv <- globalenv()

  # Make sure to leave '.Random.seed' as-is on exit
  old_seed <- genv$.Random.seed
  on.exit(suspendInterrupts({
    if (is.null(old_seed)) {
      rm(".Random.seed", envir = genv, inherits = FALSE)
    } else {
      assign(".Random.seed", value = old_seed, envir = genv, inherits = FALSE)
    }
  }))

  # Assign saved seed from boot.out
  assign(".Random.seed", value = boot.out$seed, envir = genv)

  # Generate same random numbers from boot() call
}

This implementation leaves .GlobalEnv unmodified when the function
exits, regardless of how.

If CRAN would not allow temporarily modifying the .Random.seed this
way, there would not be possible to distribute a lot of statistical
algorithm on CRAN. I can't speak for CRAN, but I would assume they
accept this type of approach, since it should have no side effects.
FWIW, this strategy is used by several CRAN packages for generating
random numbers in "stealth mode" while R's RNG, e.g. 'future.apply'
(mine) and 'withr'. If you have a GitHub account, you find hundres of
other CRAN examples by searching "org:cran assign .Random.seed".  So,
you're not alone in this need.  As a fallback, you could use
`withr::with_seed()` which uses the above approach. That would put the
burden on the 'withr' maintainer to argue for this approach.

Hope this helps,

Henrik
On Wed, Sep 14, 2022 at 8:11 AM Noah Greifer <noah.greifer at gmail.com> wrote:
#
Hi Henrik,

Thank you so much, this is extremely helpful! Glad to know my approach is
not unreasonable. Hopefully CRAN will be okay with these modifications.

Noah

On Wed, Sep 14, 2022 at 2:16 PM Henrik Bengtsson <henrik.bengtsson at gmail.com>
wrote:

  
  
#
a single integer.

If I recall correctly, S-plus's set.seed() would accept a .Random.seed
value as an input.  It did some basic validation checks on it and set it as
the current .Random.seed.  I don't recall the name of the argument.  The
format of the seed depends on the generator used - I think it used the
nature of the proffered seed to get the likely generator, but there may
have also been arguments to specify the generator explicitly.

I think this would be a nice thing to add to R's set.seed.

-Bill
On Wed, Sep 14, 2022 at 8:11 AM Noah Greifer <noah.greifer at gmail.com> wrote: