Skip to content

set.seed() in a package

6 messages · Marvin Wright, Duncan Murdoch, Peter Dalgaard +1 more

#
Hi all, 

I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. 

I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something? 

Best, 

Marvin
#
On 30/10/2019 3:28 a.m., Marvin Wright wrote:
set.seed() writes .Random.seed in the user's global environment, which 
violates this policy:

- Packages should not modify the global environment (user?s workspace).

However, every call to a random number generator creates or modifies 
.Random.seed as well, and most of those are expected and shouldn't be 
flagged.  And interplot() is documented to do random simulations, so it 
would be expected to change the seed:  the issue is that given the same 
inputs it always changes it to the same thing.  I think that would be 
quite hard for a test to detect.

Should it be a policy with no test?  Maybe, because I agree with you 
that interplot()'s set.seed(324) is bad practice.

Duncan Murdoch
#
We commit a similar sin in the help pages, e.g.

example(set.seed) ; runif(2)
example(set.seed) ; runif(2)

gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)

You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons? 

-pd

  
    
#
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
I think it's pretty common in example code, and that's justifiable.  But 
it could be avoided by using withr::with_seed() or something equivalent.

Duncan Murdoch
#
This is actually quite tedious to get correct; it requires you to
under how and when .Random.seed is set, and what are valid values on
.Random.seed.   For instance, a common mistake (me too) is to reset to
.GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
produce a warning on: ".Random.seed' is not an integer vector but of
type 'NULL', so ignored".  You end up having to do things such as:

  oseed <- .GlobalEnv$.Random.seed
  on.exit({
    if (is.null(oseed)) {
      rm(list=".Random.seed", envir= .GlobalEnv)
    } else {
      assign(".Random.seed", value=oseed, envir= .GlobalEnv)
    }
  })

to avoid that warning.  So, having support functions for this in base
R would be helpful, e.g.

  oseed <- base::getRandomSeed()
  on.exit(base::setRandomSeed(oseed))

Back to Marvin's point/question.  I think it would be useful if the
CRAN Policies would explicitly say that functions must not change the
random seed to a fixed one, without undoing it, unless the user
specifies it via an argument.  If not, there's a great risk it will
mess up statistical analysis.  Also, if there would a way to test
against such practices, which I think is really hard, I would be the
first one backing it up.

On a related note; there are packages that forward the .Random.seed
when loaded.  This is also an unfortunate behavior because you will
give different RNGs depending on that package was already loaded
before you called a function that depends on it or not.  For example,
if pkgA forwards the .Random.seed when loaded, the following is *not*
reproducible:

# User might or might not have loaded pkgA already
if (runif(1) < 0.5) loadNamespace(pkgA)

set.seed(0xBEEF)
loadNamespace(pkgA)
y <- runif(1)

I ran into this problem when doing some strict testing.  I argue this
also falls into the set of "bad" practices to be avoided.

/Henrik
On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
#
Forgot to say: For,

  oseed <- base::getRandomSeed()
  on.exit(base::setRandomSeed(oseed))

one could upgrade set.seed() to take this role, e.g.

  oseed <- set.seed(0xBEEF)
  on.exit(set.seed(oseed))

Current, set.seed() always return NULL.

BTW, and my memory might be bad, I think I mentioned this in the past
but was told that you cannot reset the RNG state for all types of RNG
kinds.  That might complicate things, but on the other hand, that
could be checked for at run-time by the above functions.

/Henrik

On Wed, Oct 30, 2019 at 9:50 AM Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote: