Skip to content
Prev 1534 / 2152 Next

mclapply: rm intermediate objects and returning memory

Ramon,
On Oct 15, 2012, at 10:47 AM, Ramon Diaz-Uriarte wrote:

            
That should not be the case since x goes out of scope, it is not stored by mclapply (NB: each job is a simple lapply()). However, you may be running into something else: note that the jobs are all independent, so they are not aware of each other's memory usage. One job cannot trigger garbage collection in another one. So what probably happens is that the faster jobs feel just fine, because they are not running out of memory so they don't trigger garbage collection. However, another job may be strapped for memory, it will run its own garbage collection, but that won't free enough memory. It cannot trigger gc in the other job, so it is stuck. By forcing gc() you're making sure that all jobs will be running the garbage collector and thus it's less likely that they will push each other out of memory.
The intermediate result itself is harmless, it will be collected eventually, but not necessarily right after the function returns. However, depending on your machine's memory usage that delay may be enough to trigger the above. I cannot replicate your problem on my machine, so try just adding gc() alone - it should make sure that at least all temporary objects from previous iterations are gone and I would hope that it solves your problem. If you are on the edge with your memory usage, you may want to run something like {x <- local({ ... }); gc(); x} - but in your case it should not be necessary since the temporary objects are small per iteration.


Re. your suggestion, I agree that mclapply offers two extremes: either work size of 1 or n/cores, nothing in between. For some applications it may be beneficial to use other sizes - if someone would be willing to give it a shot, I could review it.

Cheers,
Simon