Skip to content

Extract polygon values from large raster dataset

7 messages · Spencer Scheidt, Matthew Landis, Mathieu Basille +3 more

#
Hello,

I am attempting to extract land cover values from the NLCD 2006 dataset
(http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a
SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB in
size (30 m resolution), and the grid that I would like to work with contains
5520 polygons (0.5 degree resolution).

I have written a script based on previous postings that have asked a similar
question and Applied Spatial Data Analysis in R by Bivand et. al, and used
this approach successfully for smaller SpatialPolygons* objects with the
same NLCD dataset.  

What I am wondering is if there is a quicker way to extract my polygon
values, either by breaking the raster and polygons into smaller chunks, or
using an entirely different approach.  I realize that the extreme sizes of
my 2 objects may be the ultimate constraint, and if necessary I can reduce
the resolution of my grid.

Below is my working code, which uses 2 different approaches, via rasterize
(estimated to take 10+ days to run and cross tabulate) and extract
(estimated to take over 30 days to run):

#load appropriate libraries
library(maps)
library(rgdal)
library(raster)
library(sp)
library(maptools)

#create bounding box
us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)

#create grid
getClass("GridTopology")

cs<-c(0.5, 0.5)
cc<-us.box[,1] + (cs/2)
cd<-ceiling(diff(t(us.box))/cs)
us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
us.grid

#create spatial polygons
#projection is albers equal area conic
prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
+units=m"
us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string =
CRS("+proj=longlat +ellps=WGS84"))
us.SP<-spTransform(us.sp, CRS(prj.string))

summary(us.SP)

#check to make sure ID values are appropriate - should be listed by row from
top left
IDvaluesGridTopology(us.grid)
slot(us.SP, "polygons")

#load land cover data raster (16 GB)
env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
+lon_0=-96") 

#create raster of polygons
poly.raster <- rasterize(us.SP, env.data, progress = 'window')
df <- crosstab(poly.raster, env.data, progress = 'window')

#maybe another way?
ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window')

Any help or advice on working with large datasets is much appreciated. 
Thank you for your time.

Spencer



--
View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
Sent from the R-sig-geo mailing list archive at Nabble.com.
#
Spencer,  it's been awhile since I tried to use rasterize(), but when I 
did, I found it very slow compared to alternative approaches, e.g. using 
ArcGIS.  In the end, I discovered gdal_rasterize ( 
http://www.gdal.org/gdal_rasterize.html), which seems to be faster, 
although I've never compared them head to head.  That's what I now use 
on a regular basis.  Maybe it will work for you?

M
On 06/18/2012 07:07 PM, Spencer Scheidt wrote:

  
    
#
You might also want to consider using PostGIS [1], which now handles 
rasters by default in the recent 2.0 version, and was precisely developed 
to handle large maps easily. Note that it uses GDAL for some raster 
functions, so that it might not be faster than GDAL itself...

Mathieu.

[1] http://postgis.refractions.net/


Le 18/06/2012 21:54, Matthew Landis a ?crit :