[External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory
Thank you Luke for looking into it. Your knowledge of gc is definitely helpful here. I put comments inline below. Best, Jan
On Wed, Nov 25, 2020 at 10:38 PM <luke-tierney at uiowa.edu> wrote:
On Tue, 24 Nov 2020, Jan Gorecki wrote:
As for other calls to system. I avoid calling system. In the past I had some (to get memory stats from OS), but they were failing with exactly the same issue. So yes, if I would add call to system before calling quit, I believe it would fail with the same error. At the same time I think (although I am not sure) that new allocations made in R are working fine. So R seems to reserve some memory and can continue to operate, while external call like system will fail. Maybe it is like this by design, don't know.
Thanks for the report on quit(). We're exploring how to make the cleanup on exit more robust to low memory situations like these.
Aside from this problem that is easy to report due to the warning message, I think that gc() is choking at the same time. I tried to make reproducible example for that, multiple times but couldn't, let me try one more time. It happens to manifest when there is 4e8+ unique characters/factors in an R session. I am able to reproduce it using data.table and dplyr (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy because of the size. I described briefly problem in: https://github.com/h2oai/db-benchmark/issues/110
Because of the design of R's character vectors, with each element allocated separately, R is never going to be great at handling huge numbers of distinct strings. But it can do an adequate job given enough memory to work with. When I run your GitHub issue example on a machine with around 500 Gb of RAM it seems to run OK; /usr/bin/time reports 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 92180796maxresident)k 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps So the memory footprint is quite large. Using gc.time() it looks like about 1/3 of the time is in GC. Not ideal, and maybe could be improved on a bit, but probably not by much. The GC is basically doing an adequate job, given enough RAM.
Agree, 1/3 is a lot but still acceptable. So this strictly is not something that requires intervention. PS. I wasn't aware of gc.time(), it may be worth linking it from SeeAlso in gc() manual.
If you run this example on a system without enough RAM, or with other programs competing for RAM, you are likely to end up fighting with your OS/hardware's virtual memory system. When I try to run it on a 16Gb system it churns for an hour or so before getting killed, and /usr/bin/time reports a huge number of page faults: 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps You are probably experiencing something similar.
Yes, this is exactly what I am experiencing. The machine is a bare metal machine of 128GB mem, csv size 50GB, data.frame size 74GB. In my case it churns for ~3h before it gets killed with SIGINT from the parent R process which uses 3h as a timeout for this script. This is something I would like to be addressed because gc time is far bigger than actual computation time. This is not really acceptable, I would prefer to raise an exception instead.
There may be opportunities for more tuning of the GC to better handle running this close to memory limits, but I doubt the payoff would be worth the effort.
If you don't have plans/time to work on that anytime soon, then I can fill bugzilla for this problem so it won't get lost in the mailing list.
Best, luke
It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print even more information about gc, like how much time the each gc() process took, how many objects it has to check on each level. Best regards, Jan On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:
On 11/24/20 11:27 AM, Jan Gorecki wrote:
Thanks Bill for checking that. It was my impression that warnings are raised from some internal system calls made when quitting R. At that point I don't have much control over checking the return status of those. Your suggestion looks good to me. Tomas, do you think this could help? could this be implemented?
I think this is a good suggestion. Deleting files on Unix was changed
from system("rm") to doing that in C, and deleting the session directory
should follow.
It might also help diagnosing your problem, but I don't think it would
solve it. If the diagnostics in R works fine and the OS was so
hopelessly out of memory that it couldn't run any more external
processes, then really this is not a problem of R, but of having
exhausted the resources. And it would be a coincidence that just this
particular call to "system" at the end of the session did not work.
Anything else could break as well close to the end of the script. This
seems the most likely explanation to me.
Do you get this warning repeatedly, reproducibly at least in slightly
different scripts at the very end, with this warning always from quit()?
So that the "call" part of the warning message has .Internal(quit) like
in the case you posted? Would adding another call to "system" before the
call to "q()" work - with checking the return value? If it is always
only the last call to "system" in "q()", then it is suspicious, perhaps
an indication that some diagnostics in R is not correct. In that case, a
reproducible example would be the key - so either if you could diagnose
on your end what is the problem, or create a reproducible example that
someone else can use to reproduce and debug.
Best
Tomas
On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap <williamwdunlap at gmail.com> wrote:
The call to system() probably is an internal call used to delete the session's tempdir(). This sort of failure means that a potentially large amount of disk space is not being recovered when R is done. Perhaps R_CleanTempDir() could call R_unlink() instead of having a subprocess call 'rm -rf ...'. Then it could also issue a specific warning if it was impossible to delete all of tempdir(). (That should be very rare.)
q("no")
Breakpoint 1, R_system (command=command at entry=0x7fffffffa1e0 "rm -Rf /tmp/RtmppoKPXb") at sysutils.c:311
311 {
(gdb) where
#0 R_system (command=command at entry=0x7fffffffa1e0 "rm -Rf /tmp/RtmppoKPXb") at sysutils.c:311
#1 0x00005555557c30ec in R_CleanTempDir () at sys-std.c:1178
#2 0x00005555557c31d7 in Rstd_CleanUp (saveact=<optimized out>, status=0, runLast=<optimized out>) at sys-std.c:1243
#3 0x00005555557c593d in R_CleanUp (saveact=saveact at entry=SA_NOSAVE, status=status at entry=0, runLast=<optimized out>) at system.c:87
#4 0x00005555556cc85e in do_quit (call=<optimized out>, op=<optimized out>, args=0x555557813f90, rho=<optimized out>) at main.c:1393
-Bill
On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:
On 11/21/20 6:51 PM, Jan Gorecki wrote:
Dear R-developers,
Some of the more fat scripts (50+ GB mem used by R) that I am running,
when they finish they do quit with q("no", status=0)
Quite often it happens that there is an extra stderr output produced
at the very end which looks like this:
Warning message:
In .Internal(quit(save, status, runLast)) :
system call failed: Cannot allocate memory
Is there any way to avoid this kind of warnings? I am using stderr
output for detecting failures in scripts and this warning is a false
positive of a failure.
Maybe quit function could wait little bit longer trying to allocate
before it raises this warning?
If you see this warning, some call to system() or system2() or similar, which executes an external program, failed to even run a shell to run that external program, because there was not enough memory. You should be able to find out where it happens by checking the exit status of system(). Tomas
Best regards, Jan Gorecki
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu