[R-pkg-devel] Unused data is silently kept in the environment of a function

Dear all,

I want to compute processing functions to apply to the data.
I apply the functions to the data in a second step.
proc_0 increases the memory, proc_1 is safe.
reprex below.

If this behavior is known, could you tell me a workaround before I try 
to guess the best one?

Best,
Samuel

``` r
# for memory tracking
library(pryr)

# a class
setClass(
 ? "fb",
 ? slots = list(d = "numeric", f = "list"),
 ? prototype=list(d = NULL, f = NULL)
)

# memory increased: keep dat somewhere and link it back to the returned 
value
proc_0 <- function(x) {
 ? dat = sample(x at d)
 ? cofactors = c(mean(dat), median(dat), IQR(dat))
 ? model = sapply(cofactors, function(cofactor) function(z) z / cofactor)
 ? x at f = list(model)
 ? x
}

# init data
mem_used()
#> 47 MB
a = new("fb")
a at d = sample(rnorm(1e7))
a at f = list()
mem_used()
#> 127 MB
# memory increased of 80 MB
# process
b = proc_0(a)
mem_used()
#> 207 MB
# memory increased of 80 MB again
rm(a)
mem_used()
#> 207 MB
# memory didn't decreased
b at d = b at d + 1
mem_used()
#> 287 MB
# memory increased
# b at d was really pointing to a at d before increment
sapply(1:3, function(i) ls(environment(b at f[[1]][[i]])))
#> [1] "cofactor" "cofactor" "cofactor"
sapply(1:3, function(i) get("cofactor", environment(b at f[[1]][[i]])))
#> [1] -0.0003085559? 0.0001107148? 1.3485980291
# environments look fine
rm(b)
mem_used()
#> 47.5 MB
# memory released back

# memory safe
proc_1 <- function(x) {
 ? cofactors = c(mean(x at d), median(x at d), IQR(x at d))
 ? model = sapply(cofactors, function(cofactor) function(z) z / cofactor)
 ? x at f = list(model)
 ? x
}

# init data
mem_used()
#> 47.5 MB
a = new("fb")
a at d = sample(rnorm(1e7))
a at f = list()
mem_used()
#> 128 MB
b = proc_1(a)
mem_used()
#> 128 MB
# memory didn't increased; b at d points to a at d; functions weight a few KB
rm(a)
mem_used()
#> 128 MB
sapply(1:3, function(i) ls(environment(b at f[[1]][[i]])))
#> [1] "cofactor" "cofactor" "cofactor"
sapply(1:3, function(i) get("cofactor", environment(b at f[[1]][[i]])))
#> [1] -0.0003133312 -0.0002510665? 1.3491459433

rm(b)
mem_used()
#> 47.5 MB

```

<sup>Created on 2022-07-08 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)</sup>

<details style="margin-bottom:10px;">
<summary>
Session info
</summary>

``` r
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
#> [5] LC_TIME=French_France.utf8
#>
#> attached base packages:
#> [1] stats???? graphics? grDevices utils???? datasets methods?? base
#>
#> other attached packages:
#> [1] pryr_0.1.5
#>
#> loaded via a namespace (and not attached):
#>? [1] Rcpp_1.0.8.3???? codetools_0.2-18 digest_0.6.29 withr_2.5.0
#>? [5] magrittr_2.0.3?? reprex_2.0.1???? evaluate_0.15 highr_0.9
#>? [9] stringi_1.7.6??? rlang_1.0.3????? cli_3.3.0 rstudioapi_0.13
#> [13] fs_1.5.2???????? lobstr_1.1.2???? rmarkdown_2.14 tools_4.2.1
#> [17] stringr_1.4.0??? glue_1.6.2?????? xfun_0.31 yaml_2.3.5
#> [21] fastmap_1.1.0??? compiler_4.2.1?? htmltools_0.5.2 knitr_1.39
```

</details>