Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something? Best, Marvin
set.seed() in a package
6 messages · Marvin Wright, Duncan Murdoch, Peter Dalgaard +1 more
On 30/10/2019 3:28 a.m., Marvin Wright wrote:
Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
set.seed() writes .Random.seed in the user's global environment, which violates this policy: - Packages should not modify the global environment (user?s workspace). However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged. And interplot() is documented to do random simulations, so it would be expected to change the seed: the issue is that given the same inputs it always changes it to the same thing. I think that would be quite hard for a test to detect. Should it be a policy with no test? Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice. Duncan Murdoch
We commit a similar sin in the help pages, e.g. example(set.seed) ; runif(2) example(set.seed) ; runif(2) gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.) You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons? -pd
On 30 Oct 2019, at 13:46 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 30/10/2019 3:28 a.m., Marvin Wright wrote:
Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
set.seed() writes .Random.seed in the user's global environment, which violates this policy: - Packages should not modify the global environment (user?s workspace). However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged. And interplot() is documented to do random simulations, so it would be expected to change the seed: the issue is that given the same inputs it always changes it to the same thing. I think that would be quite hard for a test to detect. Should it be a policy with no test? Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice. Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
We commit a similar sin in the help pages, e.g. example(set.seed) ; runif(2) example(set.seed) ; runif(2) gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
I think it's pretty common in example code, and that's justifiable. But it could be avoided by using withr::with_seed() or something equivalent. Duncan Murdoch
You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons? -pd
On 30 Oct 2019, at 13:46 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 30/10/2019 3:28 a.m., Marvin Wright wrote:
Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
set.seed() writes .Random.seed in the user's global environment, which violates this policy: - Packages should not modify the global environment (user?s workspace). However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged. And interplot() is documented to do random simulations, so it would be expected to change the seed: the issue is that given the same inputs it always changes it to the same thing. I think that would be quite hard for a test to detect. Should it be a policy with no test? Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice. Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
You can fairly easily work around that by saving and restoring .Random.seed.
This is actually quite tedious to get correct; it requires you to
under how and when .Random.seed is set, and what are valid values on
.Random.seed. For instance, a common mistake (me too) is to reset to
.GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
produce a warning on: ".Random.seed' is not an integer vector but of
type 'NULL', so ignored". You end up having to do things such as:
oseed <- .GlobalEnv$.Random.seed
on.exit({
if (is.null(oseed)) {
rm(list=".Random.seed", envir= .GlobalEnv)
} else {
assign(".Random.seed", value=oseed, envir= .GlobalEnv)
}
})
to avoid that warning. So, having support functions for this in base
R would be helpful, e.g.
oseed <- base::getRandomSeed()
on.exit(base::setRandomSeed(oseed))
Back to Marvin's point/question. I think it would be useful if the
CRAN Policies would explicitly say that functions must not change the
random seed to a fixed one, without undoing it, unless the user
specifies it via an argument. If not, there's a great risk it will
mess up statistical analysis. Also, if there would a way to test
against such practices, which I think is really hard, I would be the
first one backing it up.
On a related note; there are packages that forward the .Random.seed
when loaded. This is also an unfortunate behavior because you will
give different RNGs depending on that package was already loaded
before you called a function that depends on it or not. For example,
if pkgA forwards the .Random.seed when loaded, the following is *not*
reproducible:
# User might or might not have loaded pkgA already
if (runif(1) < 0.5) loadNamespace(pkgA)
set.seed(0xBEEF)
loadNamespace(pkgA)
y <- runif(1)
I ran into this problem when doing some strict testing. I argue this
also falls into the set of "bad" practices to be avoided.
/Henrik
On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
We commit a similar sin in the help pages, e.g. example(set.seed) ; runif(2) example(set.seed) ; runif(2) gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
I think it's pretty common in example code, and that's justifiable. But it could be avoided by using withr::with_seed() or something equivalent. Duncan Murdoch
You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons? -pd
On 30 Oct 2019, at 13:46 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 30/10/2019 3:28 a.m., Marvin Wright wrote:
Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
set.seed() writes .Random.seed in the user's global environment, which violates this policy: - Packages should not modify the global environment (user?s workspace). However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged. And interplot() is documented to do random simulations, so it would be expected to change the seed: the issue is that given the same inputs it always changes it to the same thing. I think that would be quite hard for a test to detect. Should it be a policy with no test? Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice. Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Forgot to say: For, oseed <- base::getRandomSeed() on.exit(base::setRandomSeed(oseed)) one could upgrade set.seed() to take this role, e.g. oseed <- set.seed(0xBEEF) on.exit(set.seed(oseed)) Current, set.seed() always return NULL. BTW, and my memory might be bad, I think I mentioned this in the past but was told that you cannot reset the RNG state for all types of RNG kinds. That might complicate things, but on the other hand, that could be checked for at run-time by the above functions. /Henrik On Wed, Oct 30, 2019 at 9:50 AM Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote:
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
You can fairly easily work around that by saving and restoring .Random.seed.
This is actually quite tedious to get correct; it requires you to
under how and when .Random.seed is set, and what are valid values on
.Random.seed. For instance, a common mistake (me too) is to reset to
.GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
produce a warning on: ".Random.seed' is not an integer vector but of
type 'NULL', so ignored". You end up having to do things such as:
oseed <- .GlobalEnv$.Random.seed
on.exit({
if (is.null(oseed)) {
rm(list=".Random.seed", envir= .GlobalEnv)
} else {
assign(".Random.seed", value=oseed, envir= .GlobalEnv)
}
})
to avoid that warning. So, having support functions for this in base
R would be helpful, e.g.
oseed <- base::getRandomSeed()
on.exit(base::setRandomSeed(oseed))
Back to Marvin's point/question. I think it would be useful if the
CRAN Policies would explicitly say that functions must not change the
random seed to a fixed one, without undoing it, unless the user
specifies it via an argument. If not, there's a great risk it will
mess up statistical analysis. Also, if there would a way to test
against such practices, which I think is really hard, I would be the
first one backing it up.
On a related note; there are packages that forward the .Random.seed
when loaded. This is also an unfortunate behavior because you will
give different RNGs depending on that package was already loaded
before you called a function that depends on it or not. For example,
if pkgA forwards the .Random.seed when loaded, the following is *not*
reproducible:
# User might or might not have loaded pkgA already
if (runif(1) < 0.5) loadNamespace(pkgA)
set.seed(0xBEEF)
loadNamespace(pkgA)
y <- runif(1)
I ran into this problem when doing some strict testing. I argue this
also falls into the set of "bad" practices to be avoided.
/Henrik
On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
We commit a similar sin in the help pages, e.g. example(set.seed) ; runif(2) example(set.seed) ; runif(2) gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
I think it's pretty common in example code, and that's justifiable. But it could be avoided by using withr::with_seed() or something equivalent. Duncan Murdoch
You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons? -pd
On 30 Oct 2019, at 13:46 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 30/10/2019 3:28 a.m., Marvin Wright wrote:
Hi all, I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem. I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
set.seed() writes .Random.seed in the user's global environment, which violates this policy: - Packages should not modify the global environment (user?s workspace). However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged. And interplot() is documented to do random simulations, so it would be expected to change the seed: the issue is that given the same inputs it always changes it to the same thing. I think that would be quite hard for a test to detect. Should it be a policy with no test? Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice. Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel