Solved Re: Garbage collection problem - R-help

Thu, Jan 3, 2013 7:44 PM #

Thanks for your reply, Duncan - you hit the nail on the head (as
usual, the problem turned out to sit between the keyboard and the
chair :)). My function does return regression models that contain the
input formulae together with the associated (big) environment.

Peter

On Thu, Jan 3, 2013 at 4:41 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

On 13-01-03 7:01 PM, Peter Langfelder wrote:

Hello all,

I am running into a problem with garbage collection not being able to
free up all memory. Unfortunately I am unable to provide a minimal
self-contained example, although I can provide a self contained
example if anyone feels like wading through some 600 lines of code. I
would love to isolate the relevant parts from the code but whenever I
try to run a simpler example, the problem does not appear.

I run an algorithm that repeats the same calculation (on sampled, i.e.
different data) in each iteration. Each iteration uses relatively
large intermediate objects and calculations but returns a smaller
result; these results are then collated and returned from the main
function (call it myFnc). The problem is that memory used by the
intermediate calculations (it is difficult to say whether it's objects
or memory needed for apply calls) does not seem to be freed up even
after doing explicit garbage collection using gc() within the loop.

Thus, a call of something like

result = myFnc(arguments)

results is some memory that does not seem allocated to any visible
objects and yet is not freed up using gc(): After executing an actual
call to the offending function, gc() tells me that Vcells use 538.6
Mb, but the sum of object.size() of all objects listed by ls(all.names
= TRUE) is only 183.3 Mb.


The thing is that if I remove 'result' using rm(result) and do gc()
again, the memory used decreases by a lot.: gc() now reports 110.3 Mb
used in Vcells; this roughly corresponds to the sum of the sizes of
all objects returned by ls() (after removing 'result'), which is now
108.7 Mb. So used memory went down by something like 428 Mb but the
object.size of 'result' is only about 75 Mb.

Thus, it seems that the memory used by internal operations in myFun
that should be freed up upon the completion of the function call
cannot be released by garbage collection until the result of the
function call is also removed.

Like I said, I tried to replicate this behaviour on simple examples
but could not.

My question is, is this behaviour to be expected in complicated code,
or is it a bug that should be reported? Is there any way around it?

Thanks in advance for any insights or pointers.


I doubt if it is a bug.  Remember the warning from ?object.size:

"Exactly which parts of the memory allocation should be attributed to which
object is not clear-cut. This function merely provides a rough indication:
it should be reasonably accurate for atomic vectors, but does not detect if
elements of a list are shared, for example. (Sharing amongst elements of a
character vector is taken into account, but not that between character
vectors in a single object.)

If I understand correctly, sharing would inflate the sum of
object.size()'s relative to the values returned by gc(), correct? The
opposite is happening in my case.