How to extract coordinates values from a shapefile?
On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
On Wednesday 09 June 2010, Hadley Wickham wrote:
On Wed, Jun 9, 2010 at 3:24 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
The example provided by Matt assumes that each polygon consists of a
single ring, and doesn't have islands, lakes etc. The function below
pasts all coordinates to a single 2-column matrix. For clarity's sake, I
avoided nested sapply's.
library(maptools)
xx<- readShapePoly(system.file("shapes/sids.shp",
package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
+ellps=clrk66")) allcoordinates = function(x) {
ret = NULL
polys = x at polygons
for(i in 1:length(polys)) {
pp = polys[[i]]@Polygons
for (j in 1:length(pp))
ret = rbind(ret, coordinates(pp[[j]]))
}
ret
}
q = allcoordinates(xx)
# check:
plot(xx)
lines(q, col='blue')
And that's basically what fortify does, except it covers a few more cases.
Hadley
Not to be nit-picky, but wouldn't it be safer to pre-allocated "ret" instead of dynamically appending? This would allow the suggested function to scale to very large geometries.
In addition to my previous suggestion, I tried using lapply + do.call
instead of the for loop based implementation of edzer:
library(maptools)
xx<- readShapePoly(system.file("shapes/sids.shp",
package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
+ellps=clrk66"))
allcoordinates = function(x) {
ret = NULL
polys = x at polygons
for(i in 1:length(polys)) {
pp = polys[[i]]@Polygons
for (j in 1:length(pp))
ret = rbind(ret, coordinates(pp[[j]]))
}
ret
}
allcoordinates_lapply = function(x) {
polys = x at polygons
return(do.call("rbind", lapply(polys, function(pp) {
do.call("rbind", lapply(pp at Polygons, coordinates))
})))
}
q = allcoordinates(xx)
z = allcoordinates_lapply(xx)
all.equal(q,z)
So far it produces the same output. And now comes the nice part, doing a
test with a large polygon set and timing the performance of both functions:
> system.time(x <- allcoordinates(nuts))
user system elapsed
22.781 8.572 31.422
> system.time(y <- allcoordinates_lapply(nuts))
user system elapsed
0.248 0.004 0.250
> all.equal(x,y)
[1] TRUE
So apart from the imo nicer looking function in terms of syntax, it is
also several orders of magnitude faster. This effect will only become
stronger if the polygon set becomes larger. So, lapply + do.call is your
friend :).
cheers,
Paul
> sessionInfo()
R version 2.11.0 (2010-04-22)
i486-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] maptools_0.7-34 lattice_0.18-5 sp_0.9-62 foreign_0.8-40
loaded via a namespace (and not attached):
[1] grid_2.11.0
Cheers, Dylan
Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770