Memory management with rasterToPolygons (raster and rgeos)
On Thu, 5 Jan 2012, Lyndon Estes wrote:
In case it's of interest (although it won't solve the memory problems mentioned) I wrote a function to call gdal from R to polygonize rasters. It was quite fast in the instances I used it.
Very interesting! Anyone want a nice weekend project of writing the R interface to GDALPolygonize in rgdal? Roger
Mine is tailored to a Mac (10.5.8), so if used elsewhere this line might
cause problems:
py.c <- "python2.5
/Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"
runGdalPolygonize <- function(inraster, outshape, attname, gdalformat =
"ESRI Shapefile") {
# Runs gdal_polygonize.py to create polygons from raster (with limited
options at present)
# Args:
# inraster: The name of a raster object referring to a permanent,
GDAL-readable raster
# outshape: The name of the desired output polygon file
# attname: The name for the column in the output attribute table
# gdalformat: Defaults to ESRI shapefile at present
# Returns:
# Polygon file, read back into an R object
# Notes:
# This needs to be run within the directory containing the raster file
py.c <- "python2.5
/Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"
rast.nm <- unlist(strsplit(inraster at file@name,
"/"))[length(unlist(strsplit(inraster at file@name, "/")))]
full.c <- paste(py.c, rast.nm, "-f", paste("'", gdalformat, "'", sep =
""),
paste(outshape, ".shp", sep = ""), outshape, attname)
system(full.c)
shp <- readOGR(dsn = paste(outshape, ".shp", sep = ""), layer = outshape)
return(shp)
}
Cheers, Lyndon
On Thu, Jan 5, 2012 at 1:49 PM, pgalpern <pgalpern at gmail.com> wrote:
Agreed these are very large objects. I'll look at tiling as a general solution to this problem. For others facing the same challenge it is worth noting that I have been successful in running rasterToPolygons(x, dissolve=TRUE) on rasters up to 800000 cells producing an object containing approx 1500 SpatialPolygons under 64bit Windows, by ensuring there is at least 7GB of overhead memory. Run time was reasonable. R instance must be terminated following function call to free up the memory. On 05/01/2012 12:19 PM, Roger Bivand wrote:
On Thu, 5 Jan 2012, pgalpern wrote: I did some further research into my own question when I twigged to the
idea that this might be a memory leak with the GEOS library. It seems likely that is, and has been documented in this forum this October past: https://mailman.stat.ethz.ch/**pipermail/r-sig-geo/2011-** October/013289.html<https://mailman.stat.ethz.ch/pipermail/r-sig-geo/2011-October/013289.html> As of October there didn't appear to be any real resolution to the problem, except - perhaps - to run rgeos under Linux. Is this the status quo?
The issue with rgeos/GEOS is unresolved, and has led to at least two releases in the mean time. Using Linux does not help. It may be possible to run with dissolve=FALSE, and step through chunks of feature classes, in separate R scripts. However, it isn't just an rgeos issue, as: library(raster) r <- raster(nrow=500, ncol=1000) r[] <- rpois(ncell(r), lambda=70) pol <- rasterToPolygons(r, dissolve=FALSE) gives me: object.size(r)
4011736 bytes
object.size(pol)
1458003216 bytes so pol is 1.5G here, with 73 categories (I forgot to set.seed()). Only raster and sp are present here. The arithmetic is: object.size(slot(pol, "polygons")[[1]])
2896 bytes
dim(pol)
[1] 500000 1
object.size(slot(pol, "polygons")[[1]])*dim(pol)[1]
1.448e+09 bytes so the input SpatialPolygons object is already large, and GEOS needs a separate copy in its format, plus working copies. Could you work on tiles of the raster, then join those? We're still hoping that someone will help with bug-fixing in rgeos, but this is also a data representation question, I think. Hope this helps, Roger
Thanks, Paul On 04/01/2012 8:57 PM, pgalpern wrote:
Hello! Not sure if this is the best place to ask what may ultimately be an rgeos question. I am running the latest versions of the raster and rgeos packages under 64bit 2.14.1 on Windows 2008R2 Server with 12GB RAM and having some challenges with memory. I am turning rasters (approx 500 x 1000 cells) into 1500 SpatialPolygons, representing each of the feature classes. It works as it should, using rasterToPolygons(x, dissolve=T) but memory overhead is sky high. For example, a single instance of this function quickly consumes 2-3 GB and would probably consume more if other instances were not also running simultaneously. As a result disk swapping occurs which slows everything down. Interestingly, the input raster and output SpatialPolygons objects are only megabytes in size. Running this under 32bit R doesn't seem to help and occasionally results in memory allocation failures. Finally, deleting the raster and polygons objects when the function is complete and running gc() does not seem to release the memory. Instead the entire R instance needs to be terminated. Can anyone tell me if this is expected behaviour ... or perhaps suggest a more memory efficient approach. Thank you, Paul
-- Paul Galpern, PhD Candidate Natural Resources Institute 70 Dysart Road University of Manitoba Winnipeg, Manitoba, Canada R3T 2M6 http://borealscape.ca
______________________________**_________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no