Faster way to extract raster values using multiple polygons?
Hi Ben, Thank you so much for the suggestion. It is pretty fast, and I guess it's a good assumption for temperature. I'll have to do the same with precipitation, which is way more spatially variable than temperature. In that case, I think I'll have to wait the 10 hours it takes to run "extract" using a mean value of all cells covering a polygon. It is a one-time processing anyway, so I should not worry too much about the waiting time. Greetings, -- Thiago V. dos Santos PhD student Land and Atmospheric Science University of Minnesota
On Wednesday, April 6, 2016 10:48 AM, Ben Tupper <btupper at bigelow.org> wrote:
Hi, The resolution of your raster data (1 degree) is much more coarse than what your polygons represent. Could you short-circuit the process by assuming that the temp at the centroid of each polygon would suitably represent the mean temperature across each polygon? Unless you have some much bigger polygons, I can't imagine it will be very far off. If so, then you could pretty quickly extract the values for each layer in the raster at each centroid. Perhaps like this? cents <- coordinates(br_sub) v <- extract(b, cents) Is that close enough? Cheers, Ben
On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo <r-sig-geo at r-project.org> wrote:
Dear all,
I am trying to extract a time series from a raster brick using multiple polygons.
The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.
As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.
My initial, VERY SLOW attempt was something like this:
library(raster)
# Create some sample raster data
idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
r <- raster(ncol=360, nrow=180)
b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
b <- setZ(b, idx)
# Import polygon data
br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
# Subset the SMALLEST state - only 75 municipalities
br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
plot(br_sub)
# Now let's extract the data for each municipality
beginCluster()
e <- extract(b, br_sub, fun=mean, df=T)
endCluster()
Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.
I have the feeling that my code is not very optimized.
Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
Thanks in advance,
-- Thiago V. dos Santos
PhD student
Land and Atmospheric Science
University of Minnesota
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org