Memory management with rasterToPolygons (raster and rgeos)

In case it's of interest (although it won't solve the memory problems
mentioned) I wrote a function to call gdal from R to polygonize rasters. It
was quite fast in the instances I used it.
Very interesting! Anyone want a nice weekend project of writing the R 
interface to GDALPolygonize in rgdal?

Roger
Mine is tailored to a Mac (10.5.8), so if used elsewhere this line might
cause problems:

py.c <- "python2.5
/Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"

runGdalPolygonize <- function(inraster, outshape, attname, gdalformat =
"ESRI Shapefile") {
# Runs gdal_polygonize.py to create polygons from raster (with limited
options at present)
# Args:
#   inraster: The name of a raster object referring to a permanent,
GDAL-readable raster
#   outshape: The name of the desired output polygon file
#   attname: The name for the column in the output attribute table
#   gdalformat: Defaults to ESRI shapefile at present
# Returns:
#   Polygon file, read back into an R object
# Notes:
#   This needs to be run within the directory containing the raster file

 py.c <- "python2.5
/Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"
 rast.nm <- unlist(strsplit(inraster at file@name,
"/"))[length(unlist(strsplit(inraster at file@name, "/")))]
 full.c <- paste(py.c, rast.nm, "-f", paste("'", gdalformat, "'", sep =
""),
                 paste(outshape, ".shp", sep = ""), outshape, attname)
 system(full.c)
 shp <- readOGR(dsn = paste(outshape, ".shp", sep = ""), layer = outshape)
 return(shp)
}

Cheers, Lyndon

On Thu, Jan 5, 2012 at 1:49 PM, pgalpern <pgalpern at gmail.com> wrote:

Agreed these are very large objects.  I'll look at tiling as a general
solution to this problem.

For others facing the same challenge it is worth noting that I have been
successful in running rasterToPolygons(x, dissolve=TRUE) on rasters up to
800000 cells producing an object containing approx  1500 SpatialPolygons
under 64bit Windows, by ensuring there is at least 7GB of overhead memory.
 Run time was reasonable.  R instance must be terminated following function
call to free up the memory.

On 05/01/2012 12:19 PM, Roger Bivand wrote:

On Thu, 5 Jan 2012, pgalpern wrote:

 I did some further research into my own question when I twigged to the
idea that this might be a memory leak with the GEOS library.

It seems likely that is, and has been documented in this forum this
October past: https://mailman.stat.ethz.ch/**pipermail/r-sig-geo/2011-**
October/013289.html<https://mailman.stat.ethz.ch/pipermail/r-sig-geo/2011-October/013289.html>

As of October there didn't appear to be any real resolution to the
problem, except - perhaps - to run rgeos under Linux.

Is this the status quo?

The issue with rgeos/GEOS is unresolved, and has led to at least two
releases in the mean time. Using Linux does not help. It may be possible to
run with dissolve=FALSE, and step through chunks of feature classes, in
separate R scripts. However, it isn't just an rgeos issue, as:

library(raster)
r <- raster(nrow=500, ncol=1000)
r[] <- rpois(ncell(r), lambda=70)
pol <- rasterToPolygons(r, dissolve=FALSE)

gives me:

 object.size(r)

4011736 bytes

object.size(pol)

1458003216 bytes

so pol is 1.5G here, with 73 categories (I forgot to set.seed()). Only
raster and sp are present here. The arithmetic is:

 object.size(slot(pol, "polygons")[[1]])

2896 bytes

dim(pol)

[1] 500000      1

object.size(slot(pol, "polygons")[[1]])*dim(pol)[1]

1.448e+09 bytes

so the input SpatialPolygons object is already large, and GEOS needs a
separate copy in its format, plus working copies.

Could you work on tiles of the raster, then join those?

We're still hoping that someone will help with bug-fixing in rgeos, but
this is also a data representation question, I think.

Hope this helps,

Roger

Thanks,
Paul

On 04/01/2012 8:57 PM, pgalpern wrote:

Hello!

Not sure if this is the best place to ask what may ultimately be an
rgeos question.

I am running the latest versions of the raster and rgeos packages under
64bit 2.14.1 on Windows 2008R2 Server with 12GB RAM and having some
challenges with memory.

I am turning rasters (approx 500 x 1000 cells) into 1500
SpatialPolygons, representing each of the feature classes. It works as it
should, using rasterToPolygons(x, dissolve=T) but memory overhead is sky
high.

For example, a single instance of this function quickly consumes 2-3 GB
and would probably consume more if other instances were not also running
simultaneously.   As a result disk swapping occurs which slows everything
down.  Interestingly, the input raster and output SpatialPolygons objects
are only megabytes in size.  Running this under 32bit R doesn't seem to
help and occasionally results in memory allocation failures.

Finally, deleting the raster and polygons objects when the function is
complete and running gc() does not seem to release the memory.  Instead the
entire R instance needs to be terminated.

Can anyone tell me if this is expected behaviour ... or perhaps suggest
a more memory efficient approach.

Thank you,
Paul

--
Paul Galpern, PhD Candidate
Natural Resources Institute
70 Dysart Road
University of Manitoba
Winnipeg, Manitoba, Canada R3T 2M6
http://borealscape.ca

______________________________**_________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Memory management with rasterToPolygons (raster and rgeos)

Thread (6 messages)