Seeing some memory leak with foreach...
On Tue, Feb 26, 2013 at 6:49 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
r-sig-geo'ers: I always hate doing this, but the test function/dataset is going to be hard to pass along to the list. Basically: I have a foreach call that has no superassignments or strange environmental manipulations, but resulted in the nodes showing a slow but steady memory creep over time. I was using a parallel backend for foreach via doParallel. Has anyone else seen this behavior (unexplained memory creep)? Is there a good way to "flush" a node? I'm trying to embed gc() at the top of my foreach function, but this process took about 24 hours to get to a memory overuse stage (multiple iterations would have passed, e.g. the function would have been called more than one time on a single node) so I'm not sure if this will work so I figured I'd ask the group about it. I've seen other people post about this on various boards with no clear response/solution to it (gc() apparently didn't work). Some other notes: there should be no resultant output of data, because the output is being written from within the foreach function (e.g. the output of the function that foreach executes is NULL). I'll see if I can work up a faster executing example later, but wanted to see if there are some general pointers for dealing with memory leaks using a parallel system.
Hi Jonathan, have you tried replacing the foreach(...) by a simple for ... to verify that the problem really is in the parallel execution, and not simply in the R code? I second Simon's suggestion to pay careful attention to possible side-effects and objects not going out of scope when you think they should (for example, if something somewhere references the environment of a function that already completed, the environment and all objects within it are not out of scope). Peter