Skip to content

memory consumption of nested (un)serialize of sys.frames()

9 messages · iuke-tier@ey m@iii@g oii uiow@@edu, Andreas Kersting, Dirk Eddelbuettel

#
Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


crash_dumps <- new.env()

f <- function() {
  x <- runif(1e5)
  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
  assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
  print(i)
  f()
}

You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.

I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:

crash_dumps <- list()

f <- function() {
  x <- runif(1e5)
  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
  crash_dumps[["last.dump"]] <- dump
  assignInMyNamespace("crash_dumps", crash_dumps)
}
#
On Wed, 7 Apr 2021, Andreas Kersting wrote:

            
I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke

  
    
#
Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5
+   print(i)
+   print(system.time(f()))
+ }
[1] 1
   user  system elapsed 
  0.028   0.004   0.034 
[1] 2
   user  system elapsed 
  0.067   0.008   0.075 
[1] 3
   user  system elapsed 
  0.176   0.000   0.176 
[1] 4
   user  system elapsed 
  0.335   0.012   0.349 
[1] 5
   user  system elapsed 
  0.745   0.023   0.770 
[1] 6
   user  system elapsed 
  1.495   0.060   1.572 
[1] 7
   user  system elapsed 
  2.902   0.136   3.040 
[1] 8
   user  system elapsed 
  5.753   0.272   6.034 
[1] 9
   user  system elapsed 
 11.807   0.708  12.597 
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 luke-tierney at uiowa.edu:
#
On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
| 
| Please see https://github.com/akersting/dumpTest for the package.
| 
| Here a session showing my issue:
| 
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
| 
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
| 
| locale:
|  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
|  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
|  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
|  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
|  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
| 
| attached base packages:
| [1] stats     graphics  grDevices utils     datasets  methods   base     
| 
| other attached packages:
| [1] dumpTest_0.1.0
| 
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|    user  system elapsed 
|   0.028   0.004   0.034 
| [1] 2
|    user  system elapsed 
|   0.067   0.008   0.075 
| [1] 3
|    user  system elapsed 
|   0.176   0.000   0.176 
| [1] 4
|    user  system elapsed 
|   0.335   0.012   0.349 
| [1] 5
|    user  system elapsed 
|   0.745   0.023   0.770 
| [1] 6
|    user  system elapsed 
|   1.495   0.060   1.572 
| [1] 7
|    user  system elapsed 
|   2.902   0.136   3.040 
| [1] 8
|    user  system elapsed 
|   5.753   0.272   6.034 
| [1] 9
|    user  system elapsed 
|  11.807   0.708  12.597 
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
| 
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd at rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest at HEAD
?  checking for file ?/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION? ...
?  preparing ?dumpTest?:
?  checking DESCRIPTION meta-information ...
?  checking for LF line-endings in source and make files and shell scripts
?  checking for empty or unneeded directories
?  building ?dumpTest_0.1.0.tar.gz?
   
Installing package into ?/usr/local/lib/R/site-library?
(as ?lib? is unspecified)
* installing *source* package ?dumpTest? ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ?dumpTest? 
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd at rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed 
  0.481   0.019   0.500 
edd at rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97                                      
   user  system elapsed     
  0.004   0.000   0.005 
[1] 98                                                                                                   
   user  system elapsed 
  0.004   0.000   0.005                   
[1] 99                     
   user  system elapsed    
  0.004   0.000   0.004                                                                                  
[1] 100                                                                                                  
   user  system elapsed                                                                                  
  0.005   0.000   0.005 
edd at rob:~$ )

Dirk
#
No issues here with that either. Looks like something is different on
your end.

Best,

luke
On Wed, 7 Apr 2021, Andreas Kersting wrote:

            

  
    
#
Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install --with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" <edd at debian.org>:
#
Hi,

For (hopefully) full reproducibility:

docker run rocker/tidyverse:4.0.5 Rscript -e 'devtools::install_github("akersting/dumpTest", INSTALL_opts = "--with-keep.source"); library(dumpTest); for (i in 1:100) {print(i); print(system.time(f()))}'

Regards,
Andreas

2021-04-07 17:09 GMT+02:00 "Andreas Kersting" <r-devel at akersting.de>:
#
I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke
On Wed, 7 Apr 2021, Andreas Kersting wrote:

            

  
    
#
Looks like this is an unavoidable interaction between the way source
references and lazy loading are implemented. The link back to the
crash_dumps environment comes though source references on an
unevaluated argument promise. Creating a fresh environment is
.onLoad() avoids this and is probably your best bet.

Having an option to serialize without source references might be nice
but would probably not be high enough on anyone's priority list to get
done anytime soon.

Best,

luke
On Thu, 8 Apr 2021, luke-tierney at uiowa.edu wrote: