Skip to content
Prev 15232 / 63424 Next

Memory Fragmentation in R

I did do gc() but only at the top level functions - there were internal 
functions in libraries/packages that were allocating space.

Here is how I think the problem happens. Consider code of the form
         x = as.vector(x)
	y = as.double(y)
where x is a 500MB matrix, y is 100 MB

Let's say we have 1201MB totally.
	Initially:
            x has 500MB, y has 100MB
            heap can grow by 601MB

	x = as.vector(x):
	   x has 500 MB, y has 100MB
            as.vector() duplicated 500MB (to be garbage collected)
            heap can grow by 101 MB

         y = as.vector(y)
            x has 500 MB, y has 100 MB
            R has 500 MB to be garbage collected
            as.vector() requires 100MB for duplicating y
            garbage collector is not run
                - required amount (100MB) < possible heap growth (101MB)
	   allocVector() calls malloc()
                - malloc() can fail at this point
                - it cannot get contiguous 100MB

You are right, it is most likely to happen close to the trigger. But the 
fix should be easy (call gc() if malloc() fails) - I initially hacked at
trying to steal vectors from the free list because I thought the problem 
I was seeing was due to address space fragmentation. The latter could 
still be a problem and would be harder to fix.

Thanks Luke and Brian!
Nawaaz