how to control the environment of a formula
On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
Dear Duncan thank you for taking the time to answer my questions! It will be quite some work to delete all the objects generated inside the function ... but if there is no other way to avoid a large environment then this is what I will do.
It's not really that hard. Use names <- ls() in the function to get a list of all of them; remove the names of variables that might be needed in the formula (and the name of the formula itself); then use rm(list=names) to delete everything else just before returning it. Duncan Murdoch
Cheers Thomas Duncan Murdoch <murdoch.duncan at gmail.com> writes:
On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
Dear List
I have experienced that objects generated with one of my packages
used a lot of space when saved on disc (object.size did not show
this!).
some debugging revealed that formula and call objects carried the
full environment of subroutines along, including even stuff not
needed by the formula or call. here is a sketch of the problem
,----
| test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
| a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah
| ~/tmp/v.rda")
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`----
I tried to replace line 3 by
,----
| as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
`----
without the desired effect. Instead adding either
,----
| environment(out$f) <- emptyenv() or environment(out$f) <- NULL
`----
has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:
,----
| test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
| a~b environment(out$f) <- emptyenv() out } d <-
| data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d)
| Error in eval(expr, envir, enclos) : could not find function
| "list"
`----
Same with NULL in place of emptyenv()
Finally using .GlobalEnv in place of emptyenv() seems to remove both
problems.
But it will cause other, less obvious problems. In a formula, the symbols mean something. By setting the environment to .GlobalEnv you're changing the meaning. You'll get nonsense in certain cases when functions look up the meaning of those symbols and find the wrong thing. (I don't have an example at hand, but I imagine it would be easy to put one together with update().)
My questions: 1) why does the argument env of as.formula have no effect?
Because the first argument already had an associated environment. You passed a ~ b, which is evaluated to a formula; calling as.formula on a formula does nothing. The env argument is only used when a new formula needs to be constructed. (You can see this in the source code; as.formula is a very simple function.)
2) is there a better way to tell formula not to copy unrelated stuff into the associated environment?
Yes, delete it. For example, you could write your function as
test <- function(x){ x <- rnorm(1000000) out <- list() out$f <- a~b
rm(x) out }
3) why does object.size not show the size of the environments that formulas can carry along?
Because many objects can share the same environment. See ?object.size for more details. Duncan Murdoch