Hi,
When I call:
over(spPolyDf, spGridDf)
The function seems to be returning the spGridDf$band1 value for the first
grid cell that is contained in each polygon.
What I would like is an overlap-area weighted average (or some provided
function like sum) of all the grid cells contained (or partially contained)
in each polygon element of spPolyDf.
I have code that does this by eventually using joinPolys() from PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
over() in sp package
6 messages · Robert J. Hijmans, Kenny Bell, Edzer Pebesma
Kenny,
Here is an alternative route
library(raster)
states <- shapefile("cb_2013_us_state_20m.shp")
tMaxGrid <- raster(unzippedFiles[1])
x <- extract(tMaxGrid, states)
This returns a list of the length of states. Each list element has a
vector with the grid cell values.
For a mean do
xs <- extract(tMaxGrid, states, fun=mean)
See ?extract for weights
Robert
On Mon, Apr 13, 2015 at 4:27 PM, Kenny Bell <kmb56 at berkeley.edu> wrote:
Hi,
When I call:
over(spPolyDf, spGridDf)
The function seems to be returning the spGridDf$band1 value for the first
grid cell that is contained in each polygon.
What I would like is an overlap-area weighted average (or some provided
function like sum) of all the grid cells contained (or partially contained)
in each polygon element of spPolyDf.
I have code that does this by eventually using joinPolys() from PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
On 04/14/2015 01:27 AM, Kenny Bell wrote:
Hi, When I call: over(spPolyDf, spGridDf) The function seems to be returning the spGridDf$band1 value for the first grid cell that is contained in each polygon.
This is documented; you can use returnList = TRUE to get the list of values for each polygon. Since you want to aggregate these anyway, you might as well use aggregate(spGridDF, spPolyDf, mean) this is gives a simple average of pixels whose centers fall inside the polygons. Do you need a weighted average, weighted by how much of the pixel overlaps with each polygon?
What I would like is an overlap-area weighted average (or some provided
function like sum) of all the grid cells contained (or partially contained)
in each polygon element of spPolyDf.
I have code that does this by eventually using joinPolys() from PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
this would assign a data.frame to what you want to be a vector (in a data.frame), so it would have to be: states$tmax20140416 <- over(states, tMaxGrid)[[1]] I'll try to modify sp so that it'd warn you in this case.
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150414/4b4b3983/attachment.bin>
Thanks so much for all this. Yes, I'm after the area weighted averages. The eventual application is to aggregate a 2.5 arcminute temperature grid up to a zip-5 polygon, many of which overlap only 2-3 grid cells. On Mon, Apr 13, 2015 at 10:47 PM, Edzer Pebesma <
edzer.pebesma at uni-muenster.de> wrote:
On 04/14/2015 01:27 AM, Kenny Bell wrote:
Hi, When I call: over(spPolyDf, spGridDf) The function seems to be returning the spGridDf$band1 value for the first grid cell that is contained in each polygon.
This is documented; you can use returnList = TRUE to get the list of values for each polygon. Since you want to aggregate these anyway, you might as well use aggregate(spGridDF, spPolyDf, mean) this is gives a simple average of pixels whose centers fall inside the polygons. Do you need a weighted average, weighted by how much of the pixel overlaps with each polygon?
What I would like is an overlap-area weighted average (or some provided function like sum) of all the grid cells contained (or partially
contained)
in each polygon element of spPolyDf. I have code that does this by eventually using joinPolys() from
PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
this would assign a data.frame to what you want to be a vector (in a data.frame), so it would have to be: states$tmax20140416 <- over(states, tMaxGrid)[[1]] I'll try to modify sp so that it'd warn you in this case. -- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Kendon Bell Email: kmb56 at berkeley.edu Phone: (510) 612-3375 Ph.D. Student Department of Agricultural & Resource Economics University of California, Berkeley Graduate Student Researcher Energy Biosciences Institute http://www.energybiosciencesinstitute.org/ [[alternative HTML version deleted]]
1 day later
On 04/14/2015 06:58 PM, Kenny Bell wrote:
Thanks so much for all this. Yes, I'm after the area weighted averages. The eventual application is to aggregate a 2.5 arcminute temperature grid up to a zip-5 polygon, many of which overlap only 2-3 grid cells.
One approach would be to use spsample to resample the temperature to a
finer grid, then use aggregate.
Another approach would be to convert the grid to SpatialPolygons, and
then use the function below, which should do exact area weighting. Let
me know if it works, or if you have other comments / questions.
aggregatePoly = function(x, by) {
require(rgeos)
i = gIntersection(x, by, byid = TRUE)
ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
grp_by = sapply(strsplit(ids.i, " "), function(x) x[2])
obs = over(SpatialPoints(coordinates(i)), x)
obs$area = sapply(i at polygons, function(x)
slot(x, name = "area"))
obs$x_area = obs[[1]] * obs$area
agg = aggregate(obs[c("x_area", "area")], list(grp_by), sum)
agg$value = with(agg, x_area / area)
agg = agg[match(row.names(by), agg$Group.1),]
row.names(agg) = row.names(by)
SpatialPolygonsDataFrame(by, agg)
}
# Example with two partly overlapping grids:
library(sp)
gt1 = SpatialGrid(GridTopology(c(0,0), c(1,1), c(4,4)))
gt2 = SpatialGrid(GridTopology(c(-1.25,-1.25), c(1,1), c(4,4)))
# convert both to polygons; give p1 attributes to aggregate
p1 = SpatialPolygonsDataFrame(as(gt1, "SpatialPolygons"),
data.frame(v = 1:16), match.ID = FALSE)
p2 = as(gt2, "SpatialPolygons")
# plot the scene:
plot(p1, xlim = c(-2,4), ylim = c(-2,4))
plot(p2, add = TRUE, border = 'red')
library(rgeos)
i = gIntersection(p1, p2, byid = TRUE)
plot(i, add=TRUE, density = 5, col = 'blue')
# plot IDs p2:
ids.p2 = sapply(p2 at polygons, function(x) slot(x, name = "ID"))
text(coordinates(p2), ids.p2)
# plot IDs i:
ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
text(coordinates(i), ids.i, cex = .8, col = 'blue')
# compute & plot area-weighted average of p1[[1]]
ret = aggregatePoly(p1, p2)
spplot(ret["value"])
On Mon, Apr 13, 2015 at 10:47 PM, Edzer Pebesma < edzer.pebesma at uni-muenster.de> wrote:
On 04/14/2015 01:27 AM, Kenny Bell wrote:
Hi, When I call: over(spPolyDf, spGridDf) The function seems to be returning the spGridDf$band1 value for the first grid cell that is contained in each polygon.
This is documented; you can use returnList = TRUE to get the list of values for each polygon. Since you want to aggregate these anyway, you might as well use aggregate(spGridDF, spPolyDf, mean) this is gives a simple average of pixels whose centers fall inside the polygons. Do you need a weighted average, weighted by how much of the pixel overlaps with each polygon?
What I would like is an overlap-area weighted average (or some provided function like sum) of all the grid cells contained (or partially
contained)
in each polygon element of spPolyDf. I have code that does this by eventually using joinPolys() from
PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
this would assign a data.frame to what you want to be a vector (in a data.frame), so it would have to be: states$tmax20140416 <- over(states, tMaxGrid)[[1]] I'll try to modify sp so that it'd warn you in this case. -- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150416/929422ed/attachment.bin>
1 day later
Thanks so much for this. The code does work. However, the gIntersection() stage is very time intensive, so I will continue to use my own code that relies on joinPolys in PBSmapping (faster). It also allows you to store the rows/weights of the grid so that aggregating many grids to polygons is feasible. I have compared the two for aggregating a temperature grid up to an Alabama polygon and they match, so your code was extremely helpful for verification - Thanks! The code is in an R package on my Github <https://github.com/kendonB/gistools>, for anyone interested. This is my first R package I've shared, so please make comments through the issues system on Github. On Wed, Apr 15, 2015 at 3:19 PM, Edzer Pebesma <
edzer.pebesma at uni-muenster.de> wrote:
On 04/14/2015 06:58 PM, Kenny Bell wrote:
Thanks so much for all this. Yes, I'm after the area weighted averages.
The
eventual application is to aggregate a 2.5 arcminute temperature grid up
to
a zip-5 polygon, many of which overlap only 2-3 grid cells.
One approach would be to use spsample to resample the temperature to a
finer grid, then use aggregate.
Another approach would be to convert the grid to SpatialPolygons, and
then use the function below, which should do exact area weighting. Let
me know if it works, or if you have other comments / questions.
aggregatePoly = function(x, by) {
require(rgeos)
i = gIntersection(x, by, byid = TRUE)
ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
grp_by = sapply(strsplit(ids.i, " "), function(x) x[2])
obs = over(SpatialPoints(coordinates(i)), x)
obs$area = sapply(i at polygons, function(x)
slot(x, name = "area"))
obs$x_area = obs[[1]] * obs$area
agg = aggregate(obs[c("x_area", "area")], list(grp_by), sum)
agg$value = with(agg, x_area / area)
agg = agg[match(row.names(by), agg$Group.1),]
row.names(agg) = row.names(by)
SpatialPolygonsDataFrame(by, agg)
}
# Example with two partly overlapping grids:
library(sp)
gt1 = SpatialGrid(GridTopology(c(0,0), c(1,1), c(4,4)))
gt2 = SpatialGrid(GridTopology(c(-1.25,-1.25), c(1,1), c(4,4)))
# convert both to polygons; give p1 attributes to aggregate
p1 = SpatialPolygonsDataFrame(as(gt1, "SpatialPolygons"),
data.frame(v = 1:16), match.ID = FALSE)
p2 = as(gt2, "SpatialPolygons")
# plot the scene:
plot(p1, xlim = c(-2,4), ylim = c(-2,4))
plot(p2, add = TRUE, border = 'red')
library(rgeos)
i = gIntersection(p1, p2, byid = TRUE)
plot(i, add=TRUE, density = 5, col = 'blue')
# plot IDs p2:
ids.p2 = sapply(p2 at polygons, function(x) slot(x, name = "ID"))
text(coordinates(p2), ids.p2)
# plot IDs i:
ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
text(coordinates(i), ids.i, cex = .8, col = 'blue')
# compute & plot area-weighted average of p1[[1]]
ret = aggregatePoly(p1, p2)
spplot(ret["value"])
On Mon, Apr 13, 2015 at 10:47 PM, Edzer Pebesma < edzer.pebesma at uni-muenster.de> wrote:
On 04/14/2015 01:27 AM, Kenny Bell wrote:
Hi, When I call: over(spPolyDf, spGridDf) The function seems to be returning the spGridDf$band1 value for the
first
grid cell that is contained in each polygon.
This is documented; you can use returnList = TRUE to get the list of values for each polygon. Since you want to aggregate these anyway, you might as well use aggregate(spGridDF, spPolyDf, mean) this is gives a simple average of pixels whose centers fall inside the polygons. Do you need a weighted average, weighted by how much of the pixel overlaps with each polygon?
What I would like is an overlap-area weighted average (or some provided function like sum) of all the grid cells contained (or partially
contained)
in each polygon element of spPolyDf. I have code that does this by eventually using joinPolys() from
PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:
library(sp)
library(rgdal)
# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())
# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")
# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)
tmpFile <- tempfile()
url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
"=bil&kind=recent&elem=tmax&range=daily&temporal=",
"20140416")
# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)
# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)
# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])
# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)
# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
this would assign a data.frame to what you want to be a vector (in a data.frame), so it would have to be: states$tmax20140416 <- over(states, tMaxGrid)[[1]] I'll try to modify sp so that it'd warn you in this case. -- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Kendon Bell Email: kmb56 at berkeley.edu Phone: (510) 612-3375 Ph.D. Student Department of Agricultural & Resource Economics University of California, Berkeley Graduate Student Researcher Energy Biosciences Institute http://www.energybiosciencesinstitute.org/ [[alternative HTML version deleted]]