[External] R crashes when using huge data sets with character string variables
Dear all Thanks a lot for your very helpful explanations and suggestions. I have increased the size of my computer's "swapfile" and this solved my problem, i.e., R no longer crashes when I work with character string variables in my large data set (probably until I work with an even larger data set). Best wishes, Arne
On Sun, 13 Dec 2020 at 11:17, I?aki Ucar <iucar at fedoraproject.org> wrote:
On Sun, 13 Dec 2020 at 04:27, <luke-tierney at uiowa.edu> wrote:
If R is receiving a kill signal there is nothing it can do about it. I am guessing you are running into a memory over-commit issue in your OS. https://en.wikipedia.org/wiki/Memory_overcommitment https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/
Correct. And in particular, this is most probably the earlyoom [1] service in action, which, I believe, is installed and enabled by default in Ubuntu 20.04. It is a simple daemon that monitors memory, and when some conditions are reached (e.g., the system is about to start swapping), it looks for offending processes and kills them. [1] https://github.com/rfjakob/earlyoom I?aki
If you have to run this close to your physical memory limits you might try using your shell's facility (ulimit for bash, limit for some others) to limit process memory/virtual memory use to your available physical memory. You can also try setting the R_MAX_VSIZE environment variable mentioned in ?Memory; that only affects the R heap, not malloc() done elsewhere. Best, luke On Sat, 12 Dec 2020, Arne Henningsen wrote:
When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- I?aki ?car
Arne Henningsen http://www.arne-henningsen.name