Skip to content

Memory management with rasterToPolygons (raster and rgeos)

6 messages · pgalpern, Lyndon Estes, Roger Bivand

#
Hello!

Not sure if this is the best place to ask what may ultimately be an 
rgeos question.

I am running the latest versions of the raster and rgeos packages under 
64bit 2.14.1 on Windows 2008R2 Server with 12GB RAM and having some 
challenges with memory.

I am turning rasters (approx 500 x 1000 cells) into 1500 
SpatialPolygons, representing each of the feature classes. It works as 
it should, using rasterToPolygons(x, dissolve=T) but memory overhead is 
sky high.

For example, a single instance of this function quickly consumes 2-3 GB 
and would probably consume more if other instances were not also running 
simultaneously.   As a result disk swapping occurs which slows 
everything down.  Interestingly, the input raster and output 
SpatialPolygons objects are only megabytes in size.  Running this under 
32bit R doesn't seem to help and occasionally results in memory 
allocation failures.

Finally, deleting the raster and polygons objects when the function is 
complete and running gc() does not seem to release the memory.  Instead 
the entire R instance needs to be terminated.

Can anyone tell me if this is expected behaviour ... or perhaps suggest 
a more memory efficient approach.

Thank you,
Paul
#
I did some further research into my own question when I twigged to the 
idea that this might be a memory leak with the GEOS library.

It seems likely that is, and has been documented in this forum this 
October past:
https://mailman.stat.ethz.ch/pipermail/r-sig-geo/2011-October/013289.html

As of October there didn't appear to be any real resolution to the 
problem, except - perhaps - to run rgeos under Linux.

Is this the status quo?

Thanks,
Paul
On 04/01/2012 8:57 PM, pgalpern wrote:

  
    
#
On Thu, 5 Jan 2012, pgalpern wrote:

            
The issue with rgeos/GEOS is unresolved, and has led to at least two 
releases in the mean time. Using Linux does not help. It may be possible 
to run with dissolve=FALSE, and step through chunks of feature classes, in 
separate R scripts. However, it isn't just an rgeos issue, as:

library(raster)
r <- raster(nrow=500, ncol=1000)
r[] <- rpois(ncell(r), lambda=70)
pol <- rasterToPolygons(r, dissolve=FALSE)

gives me:
4011736 bytes
1458003216 bytes

so pol is 1.5G here, with 73 categories (I forgot to set.seed()). Only 
raster and sp are present here. The arithmetic is:
2896 bytes
[1] 500000      1
1.448e+09 bytes

so the input SpatialPolygons object is already large, and GEOS needs a 
separate copy in its format, plus working copies.

Could you work on tiles of the raster, then join those?

We're still hoping that someone will help with bug-fixing in rgeos, but 
this is also a data representation question, I think.

Hope this helps,

Roger

  
    
#
Agreed these are very large objects.  I'll look at tiling as a general 
solution to this problem.

For others facing the same challenge it is worth noting that I have been 
successful in running rasterToPolygons(x, dissolve=TRUE) on rasters up 
to 800000 cells producing an object containing approx  1500 
SpatialPolygons under 64bit Windows, by ensuring there is at least 7GB 
of overhead memory.  Run time was reasonable.  R instance must be 
terminated following function call to free up the memory.
On 05/01/2012 12:19 PM, Roger Bivand wrote:

  
    
#
On Thu, 5 Jan 2012, Lyndon Estes wrote:

            
Very interesting! Anyone want a nice weekend project of writing the R 
interface to GDALPolygonize in rgdal?

Roger