Spatial overlay vs. extract and aggregation methods
P.S I found a solution which involves overlay and is faster than raster and
extract functions. I think it may be useful in situations when you try to
aggregate (many) small units over boundaries of some larger units,
conditioned on smaller units being completely nested within larger ones
(i.e there are no cases where a small unit falls in more than one larger
unit). I used this to recalculate boundaries between 2001 and 2011 UK
census, using 2001 Output Areas as spdf1 and electoral boundaries 2011 as
spdf2.
1) calculate centroids for spdf1, overlay with spdf2
nc.sids <- readShapeSpatial(system.file("shapes/sids.shp",
package="maptools")[1],
IDvar="FIPSNO", proj4string=CRS("+proj=longlat
+ellps=clrk66"))
nc.sids <- nc.sids[c("FIPSNO", "NAME" ,"BIR74")]
centroids = gCentroid(nc.sids,byid=TRUE)
overlay<-over(centroids, nc.sids)
sapply(over(nc.sids, geometry(centroids), returnList = TRUE), length)
vect<-(overlay[[1]])
for (i in 1:length(vect)){
nc.sids$overlay[i]<-vect[i]
}
aggregated<-tapply(nc.sids at data$BIR74, nc.sids at data$overlay, sum)
cor(nc.sids at data$BIR74, aggregated)
On Tue, Apr 28, 2015 at 1:47 PM, Edzer Pebesma <
edzer.pebesma at uni-muenster.de> wrote:
On 04/23/2015 07:04 PM, Juta Kawalerowicz wrote:
Hi, I have a spatial polygon data frame (spdf1) and try to aggregate information from it over polygons in a larger spatial polygon data frame(spdf2). I wanted to run some simple test to see that my code is
sound
(on nc.sids from maptools), but run into a problem which I don't
understand. As far as I can see, there are 2 ways to do this:
1) rasterise the smaller spdf1, extract and calculate over larger spdf2
library(maptools)
library(raster)
nc.sids <- readShapeSpatial(system.file("shapes/sids.shp",
package="maptools")[1],
IDvar="FIPSNO",
proj4string=CRS("+proj=longlat
+ellps=clrk66"))
#plot(nc.sids)
r<-raster(ncol=180, nrow=180)
extent(r)<-extent(nc.sids)
rp<-rasterize(nc.sids, r, 'BIR74')
plot(rp)
plot(nc.sids, add=TRUE)
#this will take time on larger files...
v <- extract(rp, nc.sids, weights=TRUE)
res<-sapply(v, function(x) if (!is.null(x)) {sum(apply(x, 1, prod),
na.rm=TRUE) / sum(x[,2])} else NA )
cor(res, nc.sids$BIR74)
#seems to be working ok.
2) use over function from sp package
overlay<-over(nc.sids, nc.sids)
#this should return the same indices of nc.sids which fall inside
nc.sids -
basically should be falling within itself for each polygon, right? But
this
is not what I get.
no, it uses gIntersects and picks the first intersecting polygon, which might well be a neighbour. (Actually, over(nc.sids, geometry(nc.sids)) should return indexes and this call the corresponding attribute entries; this is a bug, now fixed in rgeos on r-forge). https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022636.html has some pointers to the weighted aggregation problem of non-matching polygons; sp 1.1-0 has the rgeos implementation mentioned there, as option with sp::aggregate; see the last example of example(aggregate).
Any hints would be greatly appreciated.
Cheers,
Juta
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo