Hello, I am attempting to extract land cover values from the NLCD 2006 dataset (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in size (30 m resolution), and the grid that I would like to work with contains 5520 polygons (0.5 degree resolution). I have written a script based on previous postings that have asked a similar question and Applied Spatial Data Analysis in R by Bivand et. al, and used this approach successfully for smaller SpatialPolygons* objects with the same NLCD dataset. What I am wondering is if there is a quicker way to extract my polygon values, either by breaking the raster and polygons into smaller chunks, or using an entirely different approach. I realize that the extreme sizes of my 2 objects may be the ultimate constraint, and if necessary I can reduce the resolution of my grid. Below is my working code, which uses 2 different approaches, via rasterize (estimated to take 10+ days to run and cross tabulate) and extract (estimated to take over 30 days to run): #load appropriate libraries library(maps) library(rgdal) library(raster) library(sp) library(maptools) #create bounding box us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) #create grid getClass("GridTopology") cs<-c(0.5, 0.5) cc<-us.box[,1] + (cs/2) cd<-ceiling(diff(t(us.box))/cs) us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) us.grid #create spatial polygons #projection is albers equal area conic prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +units=m" us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = CRS("+proj=longlat +ellps=WGS84")) us.SP<-spTransform(us.sp, CRS(prj.string)) summary(us.SP) #check to make sure ID values are appropriate - should be listed by row from top left IDvaluesGridTopology(us.grid) slot(us.SP, "polygons") #load land cover data raster (16 GB) env.data = raster('nlcd2006_landcover_4-20-11_se5.img') projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96") #create raster of polygons poly.raster <- rasterize(us.SP, env.data, progress = 'window') df <- crosstab(poly.raster, env.data, progress = 'window') #maybe another way? ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window') Any help or advice on working with large datasets is much appreciated. Thank you for your time. Spencer -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html Sent from the R-sig-geo mailing list archive at Nabble.com.
Extract polygon values from large raster dataset
7 messages · Spencer Scheidt, Matthew Landis, Mathieu Basille +3 more
Spencer, it's been awhile since I tried to use rasterize(), but when I did, I found it very slow compared to alternative approaches, e.g. using ArcGIS. In the end, I discovered gdal_rasterize ( http://www.gdal.org/gdal_rasterize.html), which seems to be faster, although I've never compared them head to head. That's what I now use on a regular basis. Maybe it will work for you? M
On 06/18/2012 07:07 PM, Spencer Scheidt wrote:
Hello, I am attempting to extract land cover values from the NLCD 2006 dataset (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in size (30 m resolution), and the grid that I would like to work with contains 5520 polygons (0.5 degree resolution). I have written a script based on previous postings that have asked a similar question and Applied Spatial Data Analysis in R by Bivand et. al, and used this approach successfully for smaller SpatialPolygons* objects with the same NLCD dataset. What I am wondering is if there is a quicker way to extract my polygon values, either by breaking the raster and polygons into smaller chunks, or using an entirely different approach. I realize that the extreme sizes of my 2 objects may be the ultimate constraint, and if necessary I can reduce the resolution of my grid. Below is my working code, which uses 2 different approaches, via rasterize (estimated to take 10+ days to run and cross tabulate) and extract (estimated to take over 30 days to run): #load appropriate libraries library(maps) library(rgdal) library(raster) library(sp) library(maptools) #create bounding box us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) #create grid getClass("GridTopology") cs<-c(0.5, 0.5) cc<-us.box[,1] + (cs/2) cd<-ceiling(diff(t(us.box))/cs) us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) us.grid #create spatial polygons #projection is albers equal area conic prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +units=m" us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string = CRS("+proj=longlat +ellps=WGS84")) us.SP<-spTransform(us.sp, CRS(prj.string)) summary(us.SP) #check to make sure ID values are appropriate - should be listed by row from top left IDvaluesGridTopology(us.grid) slot(us.SP, "polygons") #load land cover data raster (16 GB) env.data = raster('nlcd2006_landcover_4-20-11_se5.img') projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96") #create raster of polygons poly.raster<- rasterize(us.SP, env.data, progress = 'window') df<- crosstab(poly.raster, env.data, progress = 'window') #maybe another way? ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window') Any help or advice on working with large datasets is much appreciated. Thank you for your time. Spencer -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html Sent from the R-sig-geo mailing list archive at Nabble.com.
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthew Landis, Ph.D. Research Scientist ISciences, LLC 61 Main St. Suite 200 Burlington VT 05401 802.864.2999 www.isciences.com ~~~~~~~~~~~~~~~~~~~~~~~~~~
You might also want to consider using PostGIS [1], which now handles rasters by default in the recent 2.0 version, and was precisely developed to handle large maps easily. Note that it uses GDAL for some raster functions, so that it might not be faster than GDAL itself... Mathieu. [1] http://postgis.refractions.net/ Le 18/06/2012 21:54, Matthew Landis a ?crit :
Spencer, it's been awhile since I tried to use rasterize(), but when I did, I found it very slow compared to alternative approaches, e.g. using ArcGIS. In the end, I discovered gdal_rasterize ( http://www.gdal.org/gdal_rasterize.html), which seems to be faster, although I've never compared them head to head. That's what I now use on a regular basis. Maybe it will work for you? M On 06/18/2012 07:07 PM, Spencer Scheidt wrote:
Hello, I am attempting to extract land cover values from the NLCD 2006 dataset (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in size (30 m resolution), and the grid that I would like to work with contains 5520 polygons (0.5 degree resolution). I have written a script based on previous postings that have asked a similar question and Applied Spatial Data Analysis in R by Bivand et. al, and used this approach successfully for smaller SpatialPolygons* objects with the same NLCD dataset. What I am wondering is if there is a quicker way to extract my polygon values, either by breaking the raster and polygons into smaller chunks, or using an entirely different approach. I realize that the extreme sizes of my 2 objects may be the ultimate constraint, and if necessary I can reduce the resolution of my grid. Below is my working code, which uses 2 different approaches, via rasterize (estimated to take 10+ days to run and cross tabulate) and extract (estimated to take over 30 days to run): #load appropriate libraries library(maps) library(rgdal) library(raster) library(sp) library(maptools) #create bounding box us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) #create grid getClass("GridTopology") cs<-c(0.5, 0.5) cc<-us.box[,1] + (cs/2) cd<-ceiling(diff(t(us.box))/cs) us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) us.grid #create spatial polygons #projection is albers equal area conic prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +units=m" us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string = CRS("+proj=longlat +ellps=WGS84")) us.SP<-spTransform(us.sp, CRS(prj.string)) summary(us.SP) #check to make sure ID values are appropriate - should be listed by row from top left IDvaluesGridTopology(us.grid) slot(us.SP, "polygons") #load land cover data raster (16 GB) env.data = raster('nlcd2006_landcover_4-20-11_se5.img') projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96") #create raster of polygons poly.raster<- rasterize(us.SP, env.data, progress = 'window') df<- crosstab(poly.raster, env.data, progress = 'window') #maybe another way? ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window') Any help or advice on working with large datasets is much appreciated. Thank you for your time. Spencer -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html Sent from the R-sig-geo mailing list archive at Nabble.com.
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
~$ whoami Mathieu Basille, Post-Doc ~$ locate Laboratoire d'?cologie Comportementale et de Conservation de la Faune + Centre d'?tude de la For?t D?partement de Biologie Universit? Laval, Qu?bec ~$ info http://ase-research.org/basille ~$ fortune ``If you can't win by reason, go for volume.'' Calvin, by Bill Watterson.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120619/29fad9eb/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120619/b8c5fc35/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120619/79923061/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120619/3331d317/attachment.pl>