Model object, when generated in a function, saves entire environment when saved
Thanks so much for all this. The first solution is what I'm going with as I want the terms object to come along so that predict still works. On Wed, Jul 27, 2016 at 12:28 PM, William Dunlap via R-devel <
r-devel at r-project.org> wrote:
Another solution is to only save the parts of the model object that
interest you. As long as they don't include the formula (which is
what drags along the environment it was created in), you will
save space. E.g.,
tfun2 <- function(subset) {
junk <- 1:1e6
list(subset=subset, lm(Sepal.Length ~ Sepal.Width, data=iris,
subset=subset)$coef)
}
saveSize(tfun2(1:4))
#[1] 152
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jul 27, 2016 at 11:19 AM, William Dunlap <wdunlap at tibco.com>
wrote:
One way around this problem is to make a new environment whose parent environment is .GlobalEnv and which contains only what the the call to lm() requires and to compute lm() in that environment.
E.g.,
tfun1 <- function (subset)
{
junk <- 1:1e+06
env <- new.env(parent = globalenv())
env$subset <- subset
with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset =
subset))
} Then we get
> saveSize(tfun1(1:4)) # see below for def. of saveSize
[1] 910 instead of the 2129743 bytes in the save file when using the naive
method.
saveSize <- function (object) {
tf <- tempfile(fileext = ".RData")
on.exit(unlink(tf))
save(object, file = tf)
file.size(tf)
}
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jul 27, 2016 at 10:48 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:
In the below, I generate a model from an environment that isn't
.GlobalEnv with a large object that is unrelated to the model
generation. It seems to save the irrelevant object unnecessarily. In
my actual use case, I am running and saving many models in a loop that
each use a single large data.frame (that gets collapsed into a small
data.frame for estimation), so removing it isn't an option.
In the case where the model exists in .GlobalEnv, everything is
peachy. So replicating whatever happens when saving the model that was
generated in .GlobalEnv at the return() stage of the function call
would fix this problem.
I was referred to this list from r-bugs. First time r-devel poster.
Hope this helps,
Kendon
```
tmp_fun <- function(x){
iris_big <- lapply(1:10000, function(x) iris)
lm(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun(1)
object.size(out)
# 48008
save(out, file = "tmp.RData", compress = FALSE)
file.size("tmp.RData")
# 57196752 - way too big
# Works fine when in .GlobalEnv
iris_big <- lapply(1:10000, function(x) iris)
out <- lm(Sepal.Length ~ Sepal.Width, data = iris)
object.size(out)
# 48008
save(out, file = "tmp.RData", compress = FALSE)
file.size("tmp.RData")
# 16641 - good size.
```
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel