Skip to content

[R-pkg-devel] Run garbage collector when too many open files

4 messages · Jan van der Laan, luke-tier@ey m@ili@g off uiow@@edu

#
Dear Uwe,

(When replying to your message, I sent the reply to r-devel and not 
r-package-devel, as Martin Meachler suggested that this thread would be 
a better fit for r-devel.)

Thanks. In the example below I used rm() explicitly, but in general 
users wouldn't do that.

One of the reasons for the large number of file handles is that 
sometimes unnamed temporary objects are created. For example:

 > library(ldat)
 > libraty(lvec)
 >
 > a <- lvec(10, "integer")
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214753f2af0'
 > b <- as_rvec(a[1:3])
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
 > print(b)
[1] 0 0 0
 >
 >
 > gc()
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
           used (Mb) gc trigger (Mb) max used (Mb)
Ncells  796936 42.6    1442291 77.1  1168576 62.5
Vcells 1519523 11.6    4356532 33.3  4740854 36.2


For debugging, I log when files are opened and closed. The call a[1:3] 
(which creates a slice of a) creates two temporary objects [1]. These 
are only deleted when I explicitly call gc() or on some other random 
moment in time.

I hope this illustrates the problem better.


Best,
Jan


[1] One improvement would be to create less temporary files; often these 
contain only very little information that is better kept in memory. But 
that is only a partial solution.
On 07-08-18 15:24, Uwe Ligges wrote:
#
In R 3.5 and later you should not need to gc() -- that should happen
automatically within the connections code.

Nevertheless, I would recommend redesigning your approach to avoid
hanging onto open file connections as these are a scarce resource.
You can keep around your temporary files without having them open and
only open/close them on access, with the close run in an on.exit or a
tryCatch/finally clause.

Best,

luke
On Tue, 7 Aug 2018, Jan van der Laan wrote:

            

  
    
#
Dear Luke,


Thanks. See below
On 07-08-18 17:07, luke-tierney at uiowa.edu wrote:
Could you elaborate on what has changed in R 3.5? As far as I can tell 
my problem also occurs in R 3.5 (my computer is still on 3.4.4; but I 
assume the solaris CRAN machine isn't). And what do you mean with 'the 
connections code'? Is there something I can so/should do to have the 
garbage collector be a bit more aggressive in cleaning up my mess?
I am afraid that this will have a large performance penalty. The files 
in question are memory mapped files from which code will reading and 
writing continuously in most cases. Of course, there will probably 
objects that are not used for large amounts of time that could be 
temporarily closed , but it will be a bit difficult for the package to 
detect which objects that will be. I would have to write my own 'garbage 
collector'.


Best,
Uwe
#
On Tue, 7 Aug 2018, Jan van der Laan wrote:

            
If you are not opening files through R connections then this is not relevant.

If you are opening files on your own via C or C++ level calls then it
is a good idea to run gc if there is a failure -- that is what the
connections code does. I would put this logic at a low level, inside
lvec and such, before signaling an error. But this should be a
fall-back. You might be starving other libraries of file handles
Have you done enough profiling to be sure this is true, in particular
for realistic usage, not small toy examples? This would be the
cleanest design. You could also maintain a small cache of open files,
but that is more work to implement.

Best,

luke