On Tue, Feb 26, 2013 at 6:49 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
r-sig-geo'ers:
I always hate doing this, but the test function/dataset is going to be
hard to pass along to the list. Basically: I have a foreach call that
has no superassignments or strange environmental manipulations, but
resulted in the nodes showing a slow but steady memory creep over
time. I was using a parallel backend for foreach via doParallel. Has
anyone else seen this behavior (unexplained memory creep)? Is there a
good way to "flush" a node? I'm trying to embed gc() at the top of my
foreach function, but this process took about 24 hours to get to a
memory overuse stage (multiple iterations would have passed, e.g. the
function would have been called more than one time on a single node)
so I'm not sure if this will work so I figured I'd ask the group about
it. I've seen other people post about this on various boards with no
clear response/solution to it (gc() apparently didn't work).
Some other notes: there should be no resultant output of data, because
the output is being written from within the foreach function (e.g. the
output of the function that foreach executes is NULL).
I'll see if I can work up a faster executing example later, but wanted
to see if there are some general pointers for dealing with memory
leaks using a parallel system.
Hi Jonathan,
have you tried replacing the foreach(...) by a simple for ... to
verify that the problem really is in the parallel execution, and not
simply in the R code?
I second Simon's suggestion to pay careful attention to possible
side-effects and objects not going out of scope when you think they
should (for example, if something somewhere references the environment
of a function that already completed, the environment and all objects
within it are not out of scope).
Peter