This StackOverflow post: https://stackoverflow.com/q/69756236/2554330 points out that objects created in one vignette are available in a later vignette. I don't think this should be happening: vignettes should be self-contained. The current answer there, https://stackoverflow.com/a/69758025/2554330, suggests that "R CMD check" will detect this. However, sometimes one vignette can replace a standard function with a custom version, and then both will work without generating an error, but the second vignette won't do the same thing if run independently. For example, try these pure Sweave vignettes: ------------------------- aaa3.Rnw: ------------------------- \documentclass{article} %\VignetteIndexEntry{Sweave aaa3} \begin{document} <<>>= mean <- function(x) "I am the Sweave mean" @ \end{document} ------------------------ aaa4.Rnw: ------------------------ \documentclass{article} %\VignetteIndexEntry{Sweave aaa4} \begin{document} <<>>= mean(1:5) @ \end{document} Put these in a package, build and install the package, and you'll see that the mean() function in aaa4.Rnw prints the result from the redefined mean in aaa3.Rnw.
Bug (?) in vignette handling
5 messages · Martin Maechler, Sebastian Meyer, Duncan Murdoch
Duncan Murdoch
on Thu, 28 Oct 2021 13:18:54 -0400 writes:
> This StackOverflow post: https://stackoverflow.com/q/69756236/2554330 > points out that objects created in one vignette are available in a later > vignette. I don't think this should be happening: vignettes should be > self-contained. I strongly agree. > The current answer there, https://stackoverflow.com/a/69758025/2554330, > suggests that "R CMD check" will detect this. However, sometimes one > vignette can replace a standard function with a custom version, and then > both will work without generating an error, but the second vignette > won't do the same thing if run independently. > For example, try these pure Sweave vignettes: > ------------------------- > aaa3.Rnw: > ------------------------- > \documentclass{article} > %\VignetteIndexEntry{Sweave aaa3} > \begin{document} > <<>>= > mean <- function(x) "I am the Sweave mean" > @ > \end{document} > ------------------------ > aaa4.Rnw: > ------------------------ > \documentclass{article} > %\VignetteIndexEntry{Sweave aaa4} > \begin{document} > <<>>= > mean(1:5) > @ > \end{document} > Put these in a package, build and install the package, and you'll see > that the mean() function in aaa4.Rnw prints the result from the > redefined mean in aaa3.Rnw. Is it because R is *not* run with --no-save --no-restore accidentally? Without looking, I would not expect that the vignettes are run inside the same running R (even though that may speedup things)
1 day later
On 29/10/2021 5:52 a.m., Martin Maechler wrote:
Duncan Murdoch
on Thu, 28 Oct 2021 13:18:54 -0400 writes:
> This StackOverflow post: https://stackoverflow.com/q/69756236/2554330 > points out that objects created in one vignette are available in a later > vignette. I don't think this should be happening: vignettes should be > self-contained.
I strongly agree.
> The current answer there, https://stackoverflow.com/a/69758025/2554330, > suggests that "R CMD check" will detect this. However, sometimes one > vignette can replace a standard function with a custom version, and then > both will work without generating an error, but the second vignette > won't do the same thing if run independently.
> For example, try these pure Sweave vignettes:
> -------------------------
> aaa3.Rnw:
> -------------------------
> \documentclass{article}
> %\VignetteIndexEntry{Sweave aaa3}
> \begin{document}
> <<>>=
> mean <- function(x) "I am the Sweave mean"
> @
> \end{document}
> ------------------------
> aaa4.Rnw:
> ------------------------
> \documentclass{article}
> %\VignetteIndexEntry{Sweave aaa4}
> \begin{document}
> <<>>=
> mean(1:5)
> @
> \end{document}
> Put these in a package, build and install the package, and you'll see
> that the mean() function in aaa4.Rnw prints the result from the
> redefined mean in aaa3.Rnw.
Is it because R is *not* run with --no-save --no-restore accidentally? Without looking, I would not expect that the vignettes are run inside the same running R (even though that may speedup things)
I think for R CMD build they are run in one process, while for R CMD
check they are in separate processes. R CMD build runs
tools::buildVignettes(), which runs code that's part of the vignette
build engine.
The Sweave engine evaluates things in .GlobalEnv, so any leftover
objects will be visible there for the next vignette. I think it's up to
the writer of each vignette engine whether there's any cleanup, but it
appears that neither Sweave nor knitr does any.
One possible fix would be for buildVignettes() to make a snapshot of
what's in .GlobalEnv before processing any vignettes, and restoring it
after each one. I've tried a weaker version of this: it records the
names in .GlobalEnv at the start, and deletes anything new before
processing each vignette. So vignettes could modify or delete what's
there, but not add anything.
I think you don't want to completely clear out .GlobalEnv, because
people might choose to run buildVignettes() in an R session and expect
the vignettes to see the contents there.
"make check" in R-devel doesn't complain about this change, but I'll let
R Core decide whether it's a good idea or not. A patch is below.
Duncan Murdoch
Index: src/library/tools/R/Vignettes.R
===================================================================
--- src/library/tools/R/Vignettes.R (revision 81110)
+++ src/library/tools/R/Vignettes.R (working copy)
@@ -560,7 +560,11 @@
sourceList <- list()
startdir <- getwd()
fails <- character()
+ # People may build vignettes from a session and expect
+ # to see some variables, so we won't delete these
+ existingVars <- ls(.GlobalEnv, all = TRUE)
for(i in seq_along(vigns$docs)) {
+ rm(list = setdiff(ls(.GlobalEnv, all = TRUE), existingVars), envir
= .GlobalEnv)
thisOK <- TRUE
file <- basename(vigns$docs[i])
enc <- vigns$encodings[i]
Am 30.10.21 um 20:28 schrieb Duncan Murdoch:
On 29/10/2021 5:52 a.m., Martin Maechler wrote:
Duncan Murdoch ???? on Thu, 28 Oct 2021 13:18:54 -0400 writes:
???? > This StackOverflow post: https://stackoverflow.com/q/69756236/2554330 ???? > points out that objects created in one vignette are available in a later ???? > vignette.? I don't think this should be happening:? vignettes should be ???? > self-contained. I strongly agree. ???? > The current answer there, https://stackoverflow.com/a/69758025/2554330, ???? > suggests that "R CMD check" will detect this.? However, sometimes one ???? > vignette can replace a standard function with a custom version, and then ???? > both will work without generating an error, but the second vignette ???? > won't do the same thing if run independently. ???? > For example, try these pure Sweave vignettes: ???? > ------------------------- ???? > aaa3.Rnw: ???? > ------------------------- ???? > \documentclass{article} ???? > %\VignetteIndexEntry{Sweave aaa3} ???? > \begin{document} ???? > <<>>= ???? > mean <- function(x) "I am the Sweave mean" ???? > @ ???? > \end{document} ???? > ------------------------ ???? > aaa4.Rnw: ???? > ------------------------ ???? > \documentclass{article} ???? > %\VignetteIndexEntry{Sweave aaa4} ???? > \begin{document} ???? > <<>>= ???? > mean(1:5) ???? > @ ???? > \end{document} ???? > Put these in a package, build and install the package, and you'll see ???? > that the mean() function in aaa4.Rnw prints the result from the ???? > redefined mean in aaa3.Rnw. Is it because R is *not* run with? --no-save --no-restore accidentally? Without looking, I would not expect that the vignettes are run inside the same running R (even though that may speedup things)
I think for R CMD build they are run in one process, while for R CMD check they are in separate processes.? R CMD build runs tools::buildVignettes(), which runs code that's part of the vignette build engine.
Thankfully R CMD check has been building the vignettes in separate R processes already since R 3.6.0, so has hopefully identified most problems until now. The corresponding env var is _R_CHECK_BUILD_VIGNETTES_SEPARATELY_. The standard (and exported!) buildVignettes() has been weaving all vignettes in the same session ever since it was added back in 2002. This approach is probably more efficient (avoiding repetitive package loading), but carry-over effects seem both likely and undesirable (thinking of vignettes as separate and independently reproducible manuscripts about different aspects of a package). AFAICS, it is not explicitly documented that buildVignettes() runs all vignettes in the same R session, so at least this is no advertised feature.
The Sweave engine evaluates things in .GlobalEnv, so any leftover objects will be visible there for the next vignette.? I think it's up to the writer of each vignette engine whether there's any cleanup, but it appears that neither Sweave nor knitr does any.
I think this is by design and also useful in interactive sessions to investigate the environment after weaving.
One possible fix would be for buildVignettes() to make a snapshot of what's in .GlobalEnv before processing any vignettes, and restoring it after each one.? I've tried a weaker version of this:? it records the names in .GlobalEnv at the start, and deletes anything new before processing each vignette.? So vignettes could modify or delete what's there, but not add anything. I think you don't want to completely clear out .GlobalEnv, because people might choose to run buildVignettes() in an R session and expect the vignettes to see the contents there. "make check" in R-devel doesn't complain about this change, but I'll let R Core decide whether it's a good idea or not.? A patch is below.
Clearing the workspace would be an improvement, but I think it would be
even better for R CMD build to produce each vignette in a clean R
session, especially with regard to loaded packages. Changing
buildVignettes() to use clean R processes by default (I'd say even if
there is only one vignette) should be considered. I'd appreciate seeing
this report in Bugzilla to investigate further (and not forget).
Best regards,
Sebastian Meyer
Duncan Murdoch
Index: src/library/tools/R/Vignettes.R
===================================================================
--- src/library/tools/R/Vignettes.R??? (revision 81110)
+++ src/library/tools/R/Vignettes.R??? (working copy)
@@ -560,7 +560,11 @@
???? sourceList <- list()
???? startdir <- getwd()
???? fails <- character()
+??? # People may build vignettes from a session and expect
+??? # to see some variables, so we won't delete these
+??? existingVars <- ls(.GlobalEnv, all = TRUE)
???? for(i in seq_along(vigns$docs)) {
+??????? rm(list = setdiff(ls(.GlobalEnv, all = TRUE), existingVars),
envir = .GlobalEnv)
???????? thisOK <- TRUE
???????? file <- basename(vigns$docs[i])
???????? enc <- vigns$encodings[i]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 30/10/2021 6:11 p.m., Sebastian Meyer wrote:
Am 30.10.21 um 20:28 schrieb Duncan Murdoch:
On 29/10/2021 5:52 a.m., Martin Maechler wrote:
Duncan Murdoch ???? on Thu, 28 Oct 2021 13:18:54 -0400 writes:
???? > This StackOverflow post: https://stackoverflow.com/q/69756236/2554330 ???? > points out that objects created in one vignette are available in a later ???? > vignette.? I don't think this should be happening:? vignettes should be ???? > self-contained. I strongly agree. ???? > The current answer there, https://stackoverflow.com/a/69758025/2554330, ???? > suggests that "R CMD check" will detect this.? However, sometimes one ???? > vignette can replace a standard function with a custom version, and then ???? > both will work without generating an error, but the second vignette ???? > won't do the same thing if run independently. ???? > For example, try these pure Sweave vignettes: ???? > ------------------------- ???? > aaa3.Rnw: ???? > ------------------------- ???? > \documentclass{article} ???? > %\VignetteIndexEntry{Sweave aaa3} ???? > \begin{document} ???? > <<>>= ???? > mean <- function(x) "I am the Sweave mean" ???? > @ ???? > \end{document} ???? > ------------------------ ???? > aaa4.Rnw: ???? > ------------------------ ???? > \documentclass{article} ???? > %\VignetteIndexEntry{Sweave aaa4} ???? > \begin{document} ???? > <<>>= ???? > mean(1:5) ???? > @ ???? > \end{document} ???? > Put these in a package, build and install the package, and you'll see ???? > that the mean() function in aaa4.Rnw prints the result from the ???? > redefined mean in aaa3.Rnw. Is it because R is *not* run with? --no-save --no-restore accidentally? Without looking, I would not expect that the vignettes are run inside the same running R (even though that may speedup things)
I think for R CMD build they are run in one process, while for R CMD check they are in separate processes.? R CMD build runs tools::buildVignettes(), which runs code that's part of the vignette build engine.
Thankfully R CMD check has been building the vignettes in separate R processes already since R 3.6.0, so has hopefully identified most problems until now. The corresponding env var is _R_CHECK_BUILD_VIGNETTES_SEPARATELY_. The standard (and exported!) buildVignettes() has been weaving all vignettes in the same session ever since it was added back in 2002. This approach is probably more efficient (avoiding repetitive package loading), but carry-over effects seem both likely and undesirable (thinking of vignettes as separate and independently reproducible manuscripts about different aspects of a package). AFAICS, it is not explicitly documented that buildVignettes() runs all vignettes in the same R session, so at least this is no advertised feature.
The Sweave engine evaluates things in .GlobalEnv, so any leftover objects will be visible there for the next vignette.? I think it's up to the writer of each vignette engine whether there's any cleanup, but it appears that neither Sweave nor knitr does any.
I think this is by design and also useful in interactive sessions to investigate the environment after weaving.
One possible fix would be for buildVignettes() to make a snapshot of what's in .GlobalEnv before processing any vignettes, and restoring it after each one.? I've tried a weaker version of this:? it records the names in .GlobalEnv at the start, and deletes anything new before processing each vignette.? So vignettes could modify or delete what's there, but not add anything. I think you don't want to completely clear out .GlobalEnv, because people might choose to run buildVignettes() in an R session and expect the vignettes to see the contents there. "make check" in R-devel doesn't complain about this change, but I'll let R Core decide whether it's a good idea or not.? A patch is below.
Clearing the workspace would be an improvement, but I think it would be even better for R CMD build to produce each vignette in a clean R session, especially with regard to loaded packages. Changing buildVignettes() to use clean R processes by default (I'd say even if there is only one vignette) should be considered. I'd appreciate seeing this report in Bugzilla to investigate further (and not forget).
Yes, that makes sense: currently R CMD build starts a new clean session and runs buildVignettes() there; it would make more sense for buildVignettes to be starting a session for each vignette. Users who don't want to run their vignettes in a clean session or who want to see the leftovers can still do so, calling Sweave or knit or whatever directly. I'll post some of this to Bugzilla. Duncan Murdoch