Skip to content

over() in sp package

6 messages · Robert J. Hijmans, Kenny Bell, Edzer Pebesma

#
Hi,

When I call:

over(spPolyDf, spGridDf)

The function seems to be returning the spGridDf$band1 value for the first
grid cell that is contained in each polygon.

What I would like is an overlap-area weighted average (or some provided
function like sum) of all the grid cells contained (or partially contained)
in each polygon element of spPolyDf.

I have code that does this by eventually using joinPolys() from PBSMapping,
but it's slow and would be much cleaner to have a sp-only solution. Is
there any way to do this? Some example code is below:

library(sp)
library(rgdal)

# Download a USA shapefile
tmpFile <- tempfile()
download.file("
http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip",
destfile = tmpFile)
unzipped <- unzip(zipfile = tmpFile, exdir = getwd())

# This is the actual reading stage
states <- readOGR(getwd(), layer="cb_2013_us_state_20m")

# Remove all the files on disk
file.remove(tmpFile)
file.remove(unzipped)

tmpFile <- tempfile()

url <- paste0("http://www.prism.oregonstate.edu/fetchData.php?type",
              "=bil&kind=recent&elem=tmax&range=daily&temporal=",
              "20140416")

# Download the file to fileName
download.file(url = url, mode = "wb", destfile = tmpFile)

# unzip the file to the working directory.
unzippedFiles <- unzip(zipfile = tmpFile, overwrite = TRUE)

# read into R
tMaxGrid <- readGDAL(fname = unzippedFiles[1])

# delete the file
file.remove(tmpFile)
file.remove(unzippedFiles)

# This doesn't work correctly
states$tmax20140416 <- over(states, tMaxGrid)
#
Kenny,

Here is an alternative route

library(raster)
states <- shapefile("cb_2013_us_state_20m.shp")
tMaxGrid <- raster(unzippedFiles[1])
x <- extract(tMaxGrid, states)

This returns a list of the length of states. Each list element has a
vector with the grid cell values.

For a mean do

xs <- extract(tMaxGrid, states, fun=mean)

See ?extract for weights

Robert
On Mon, Apr 13, 2015 at 4:27 PM, Kenny Bell <kmb56 at berkeley.edu> wrote:
#
On 04/14/2015 01:27 AM, Kenny Bell wrote:
This is documented; you can use returnList = TRUE to get the list of
values for each polygon. Since you want to aggregate these anyway, you
might as well use

aggregate(spGridDF, spPolyDf, mean)

this is gives a simple average of pixels whose centers fall inside the
polygons. Do you need a weighted average, weighted by how much of the
pixel overlaps with each polygon?
this would assign a data.frame to what you want to be a vector (in a
data.frame), so it would have to be:

states$tmax20140416 <- over(states, tMaxGrid)[[1]]

I'll try to modify sp so that it'd warn you in this case.
#
Thanks so much for all this. Yes, I'm after the area weighted averages. The
eventual application is to aggregate a 2.5 arcminute temperature grid up to
a zip-5 polygon, many of which overlap only 2-3 grid cells.

On Mon, Apr 13, 2015 at 10:47 PM, Edzer Pebesma <
edzer.pebesma at uni-muenster.de> wrote:

            

  
    
1 day later
#
On 04/14/2015 06:58 PM, Kenny Bell wrote:
One approach would be to use spsample to resample the temperature to a
finer grid, then use aggregate.

Another approach would be to convert the grid to SpatialPolygons, and
then use the function below, which should do exact area weighting. Let
me know if it works, or if you have other comments / questions.


aggregatePoly = function(x, by) {
	require(rgeos)
	i = gIntersection(x, by, byid = TRUE)
	ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
	grp_by = sapply(strsplit(ids.i, " "), function(x) x[2])

	obs = over(SpatialPoints(coordinates(i)), x)
	obs$area = sapply(i at polygons, function(x)
		slot(x, name = "area"))
	obs$x_area = obs[[1]] * obs$area
	agg = aggregate(obs[c("x_area", "area")], list(grp_by), sum)
	agg$value = with(agg, x_area / area)
	agg = agg[match(row.names(by), agg$Group.1),]
	row.names(agg) = row.names(by)
	SpatialPolygonsDataFrame(by, agg)
}

# Example with two partly overlapping grids:
library(sp)

gt1 = SpatialGrid(GridTopology(c(0,0), c(1,1), c(4,4)))
gt2 = SpatialGrid(GridTopology(c(-1.25,-1.25), c(1,1), c(4,4)))

# convert both to polygons; give p1 attributes to aggregate
p1 = SpatialPolygonsDataFrame(as(gt1, "SpatialPolygons"),
		data.frame(v = 1:16), match.ID = FALSE)
p2 = as(gt2, "SpatialPolygons")

# plot the scene:
plot(p1, xlim = c(-2,4), ylim = c(-2,4))
plot(p2, add = TRUE, border = 'red')
library(rgeos)
i = gIntersection(p1, p2, byid = TRUE)
plot(i, add=TRUE, density = 5, col = 'blue')
# plot IDs p2:
ids.p2 = sapply(p2 at polygons, function(x) slot(x, name = "ID"))
text(coordinates(p2), ids.p2)
# plot IDs i:
ids.i = sapply(i at polygons, function(x) slot(x, name = "ID"))
text(coordinates(i), ids.i, cex = .8, col = 'blue')

# compute & plot area-weighted average of p1[[1]]
ret = aggregatePoly(p1, p2)
spplot(ret["value"])

  
    
1 day later
#
Thanks so much for this.

The code does work. However, the gIntersection() stage is very time
intensive, so I will continue to use my own code that relies on joinPolys
in PBSmapping (faster).

It also allows you to store the rows/weights of the grid so that
aggregating many grids to polygons is feasible.

I have compared the two for aggregating a temperature grid up to an Alabama
polygon and they match, so your code was extremely helpful for verification
- Thanks!

The code is in an R package on my Github
<https://github.com/kendonB/gistools>, for anyone interested. This is my
first R package I've shared, so please make comments through the issues
system on Github.

On Wed, Apr 15, 2015 at 3:19 PM, Edzer Pebesma <
edzer.pebesma at uni-muenster.de> wrote: