Hi,
it looks like save() is saving all contents of the calling
environments if the object to be saved is *not* evaluated, although it
is not that simple either. After many hours of troubleshooting, I'm
still confused. Here is a reproducible example (also attached) with
output. I let the code and the output talk for itself:
peek <- function(file, from=1, to=500) {
cat("--------------------------------------\n")
cat(sprintf("%s: %d bytes\n", file, file.info(file)$size))
bfr <- suppressWarnings(readBin(file, what="character", n=to))
bfr <- gsub("(\001|\002|\003|\004|\005|\016|\020|\036|\a|\n|\t)", "", bfr);
bfr <- bfr[nchar(bfr) > 0];
cat(bfr, sep="", "\n");
}
saveCache <- function(file, y, sources=NULL, eval=FALSE) {
if (eval)
dummy <- is.null(sources)
base::save(file=file, sources, compress=FALSE)
}
aVariableNotSaved <- double(1e6)
main <- function() {
# This 'big' variable is saved in case 1 below!
big <- rep(letters, length.out=1e5)
identifier <- "This string will be saved too!"
y <- 1
file <- "a.RData"
saveCache(y, file=file)
peek(file)
file <- "a-eval.RData"
saveCache(y, file=file, eval=TRUE)
peek(file)
file <- "b-noy.RData"
saveCache(file=file)
peek(file)
file <- "b-noy-eval.RData"
saveCache(file=file, eval=TRUE)
peek(file)
}
# 1. Call saveCache() outside main()
eval(body(main))
# --------------------------------------
# a.RData: 238 bytes
# RDX2Xsources?filea.RData y?n $ n?$eval???n?
# --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources??
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n?
# --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources??
# 2. Call saveCache() from within main()
main()
# --------------------------------------
# a.RData: 900412 bytes
# RDX2Xsources?filea.RData y? a.RData ?=identifierThis
# string will be saved too!big??abcdefghijklmnopqrstuv
# wxyzabcdefghijklmnopqrstuvwxyzabcdefg
# --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources??
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n?
# --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources??
What is going on?
I get this on both R v2.3.0 patched (2006-04-28 r37936) and R v2.3.1
beta (2006-05-23 r38179) on my WinXP (with Rterm --vanilla).
save() saves extra stuff if object is not evaluated
3 messages · Henrik Bengtsson, Luke Tierney
On Thu, 25 May 2006, Henrik Bengtsson wrote:
Hi, it looks like save() is saving all contents of the calling environments if the object to be saved is *not* evaluated, although it is not that simple either.
No, it's exactly that simple. Serialization follows and writes out all reachable environments. Unevaluated promises contain the environments in which their evaluations are to occur; evaluated ones have this field set to R_NilValue to eliminate this no longer needed reference. There are two environments involved: the calling environment in which saveCache is called and the callee environment of the call to saveCache where the body of saveCache is evaluated. Because of lexical scope the enclosing environment of the callee environment is the closure environment of saveCache, which is .GlobalEnv. The call to saveCache creates a promise for evaluating the default value for 'source' _in the callee environment_. In the case with y the callee environment includes a value of y which is a promise referencing the calling environment (either .GlobalENv or the environment of the call to main). In the calls without y the value of y in the calling environment is the missing value indicator, not a promise. So only with y and no eval is there a reference to the calling environment that serialization then has to write out. Best, luke
After many hours of troubleshooting, I'm
still confused. Here is a reproducible example (also attached) with
output. I let the code and the output talk for itself:
peek <- function(file, from=1, to=500) {
cat("--------------------------------------\n")
cat(sprintf("%s: %d bytes\n", file, file.info(file)$size))
bfr <- suppressWarnings(readBin(file, what="character", n=to))
bfr <- gsub("(\001|\002|\003|\004|\005|\016|\020|\036|\a|\n|\t)", "", bfr);
bfr <- bfr[nchar(bfr) > 0];
cat(bfr, sep="", "\n");
}
saveCache <- function(file, y, sources=NULL, eval=FALSE) {
if (eval)
dummy <- is.null(sources)
base::save(file=file, sources, compress=FALSE)
}
aVariableNotSaved <- double(1e6)
main <- function() {
# This 'big' variable is saved in case 1 below!
big <- rep(letters, length.out=1e5)
identifier <- "This string will be saved too!"
y <- 1
file <- "a.RData"
saveCache(y, file=file)
peek(file)
file <- "a-eval.RData"
saveCache(y, file=file, eval=TRUE)
peek(file)
file <- "b-noy.RData"
saveCache(file=file)
peek(file)
file <- "b-noy-eval.RData"
saveCache(file=file, eval=TRUE)
peek(file)
}
# 1. Call saveCache() outside main()
eval(body(main))
# --------------------------------------
# a.RData: 238 bytes
# RDX2Xsources?filea.RData y?n $ n?$eval???n?
# --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources??
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n?
# --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources??
# 2. Call saveCache() from within main()
main()
# --------------------------------------
# a.RData: 900412 bytes
# RDX2Xsources?filea.RData y? a.RData ?=identifierThis
# string will be saved too!big??abcdefghijklmnopqrstuv
# wxyzabcdefghijklmnopqrstuvwxyzabcdefg
# --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources??
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n?
# --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources??
What is going on?
I get this on both R v2.3.0 patched (2006-04-28 r37936) and R v2.3.1
beta (2006-05-23 r38179) on my WinXP (with Rterm --vanilla).
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 5/25/06, Luke Tierney <luke at stat.uiowa.edu> wrote:
On Thu, 25 May 2006, Henrik Bengtsson wrote:
Hi, it looks like save() is saving all contents of the calling environments if the object to be saved is *not* evaluated, although it is not that simple either.
No, it's exactly that simple. Serialization follows and writes out all reachable environments. Unevaluated promises contain the environments in which their evaluations are to occur; evaluated ones have this field set to R_NilValue to eliminate this no longer needed reference. There are two environments involved: the calling environment in which saveCache is called and the callee environment of the call to saveCache where the body of saveCache is evaluated. Because of lexical scope the enclosing environment of the callee environment is the closure environment of saveCache, which is .GlobalEnv. The call to saveCache creates a promise for evaluating the default value for 'source' _in the callee environment_. In the case with y the callee environment includes a value of y which is a promise referencing the calling environment (either .GlobalENv or the environment of the call to main). In the calls without y the value of y in the calling environment is the missing value indicator, not a promise. So only with y and no eval is there a reference to the calling environment that serialization then has to write out.
Thank you very much for this sharp explanation. It is now much clearer to me what is going on. Would it make sense to make save() evaluate all non-evaluated arguments, e.g. is.null(list(...))? ...or, add an argument making this optional/default? Best wishes, Henrik
Best, luke
After many hours of troubleshooting, I'm
still confused. Here is a reproducible example (also attached) with
output. I let the code and the output talk for itself:
peek <- function(file, from=1, to=500) {
cat("--------------------------------------\n")
cat(sprintf("%s: %d bytes\n", file, file.info(file)$size))
bfr <- suppressWarnings(readBin(file, what="character", n=to))
bfr <- gsub("(\001|\002|\003|\004|\005|\016|\020|\036|\a|\n|\t)", "", bfr);
bfr <- bfr[nchar(bfr) > 0];
cat(bfr, sep="", "\n");
}
saveCache <- function(file, y, sources=NULL, eval=FALSE) {
if (eval)
dummy <- is.null(sources)
base::save(file=file, sources, compress=FALSE)
}
aVariableNotSaved <- double(1e6)
main <- function() {
# This 'big' variable is saved in case 1 below!
big <- rep(letters, length.out=1e5)
identifier <- "This string will be saved too!"
y <- 1
file <- "a.RData"
saveCache(y, file=file)
peek(file)
file <- "a-eval.RData"
saveCache(y, file=file, eval=TRUE)
peek(file)
file <- "b-noy.RData"
saveCache(file=file)
peek(file)
file <- "b-noy-eval.RData"
saveCache(file=file, eval=TRUE)
peek(file)
}
# 1. Call saveCache() outside main()
eval(body(main))
# --------------------------------------
# a.RData: 238 bytes
# RDX2Xsources?ilea.RData y? $ n?eval???> # --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources?
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?ile?b-noy.RData ?v$ n?eval???> # --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources?
# 2. Call saveCache() from within main()
main()
# --------------------------------------
# a.RData: 900412 bytes
# RDX2Xsources?ilea.RData y?a.RData ?=identifierThis
# string will be saved too!big?abcdefghijklmnopqrstuv
# wxyzabcdefghijklmnopqrstuvwxyzabcdefg
# --------------------------------------
# a-eval.RData: 58 bytes
# RDX2Xsources?
# --------------------------------------
# b-noy.RData: 230 bytes
# RDX2Xsources?ile?b-noy.RData ?v$ n?eval???> # --------------------------------------
# b-noy-eval.RData: 58 bytes
# RDX2Xsources?
What is going on?
I get this on both R v2.3.0 patched (2006-04-28 r37936) and R v2.3.1
beta (2006-05-23 r38179) on my WinXP (with Rterm --vanilla).
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu