Skip to content

How to extract coordinates values from a shapefile?

16 messages · Henrique Dallazuanna, Nikhil Kaza, Thiago V. dos Santos +5 more

#
Look at this. Not sure if this is indeed what you want or if you  
actually have to unproject them.

http://www.mail-archive.com/r-sig-geo at stat.math.ethz.ch/msg04715.html

require(ggplot2)
fortify(br)


Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.list at gmail.com
On Jun 9, 2010, at 2:45 PM, Thiago Veloso wrote:

            
#
The example provided by Matt assumes that each polygon consists of a
single ring, and doesn't have islands, lakes etc. The function below
pasts all coordinates to a single 2-column matrix. For clarity's sake, I
avoided nested sapply's.

library(maptools)
xx <- readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
  IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
allcoordinates = function(x) {
    ret = NULL
    polys = x at polygons
    for(i in 1:length(polys)) {
        pp = polys[[i]]@Polygons
        for (j in 1:length(pp))
            ret = rbind(ret, coordinates(pp[[j]]))
    }
    ret
}
q = allcoordinates(xx)

# check:
plot(xx)
lines(q, col='blue')
On 06/09/2010 09:58 PM, Matt Beard wrote:

  
    
#
On Wed, Jun 9, 2010 at 3:24 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
And that's basically what fortify does, except it covers a few more cases.

Hadley
#
On Wednesday 09 June 2010, Hadley Wickham wrote:
Not to be nit-picky, but wouldn't it be safer to pre-allocated "ret" instead 
of dynamically appending? This would allow the suggested function to scale to 
very large geometries.


Cheers,
Dylan
#
On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
A combination of lapply and do.call("rbind" also works well in terms of 
performance.

cheers,
Paul

  
    
#
On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
In addition to my previous suggestion, I tried using lapply + do.call 
instead of the for loop based implementation of edzer:

library(maptools)
xx<- readShapePoly(system.file("shapes/sids.shp", 
package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat 
+ellps=clrk66"))

allcoordinates = function(x) {
     ret = NULL
     polys = x at polygons
     for(i in 1:length(polys)) {
         pp = polys[[i]]@Polygons
         for (j in 1:length(pp))
             ret = rbind(ret, coordinates(pp[[j]]))
     }
     ret
}

allcoordinates_lapply = function(x) {
     polys = x at polygons
     return(do.call("rbind", lapply(polys, function(pp) {
     do.call("rbind", lapply(pp at Polygons, coordinates))
     })))
}

q = allcoordinates(xx)
z = allcoordinates_lapply(xx)
all.equal(q,z)

So far it produces the same output. And now comes the nice part, doing a 
test with a large polygon set and timing the performance of both functions:

 > system.time(x <- allcoordinates(nuts))
    user  system elapsed
  22.781   8.572  31.422
 > system.time(y <- allcoordinates_lapply(nuts))
    user  system elapsed
   0.248   0.004   0.250
 > all.equal(x,y)
[1] TRUE

So apart from the imo nicer looking function in terms of syntax, it is 
also several orders of magnitude faster. This effect will only become 
stronger if the polygon set becomes larger. So, lapply + do.call is your 
friend :).

cheers,
Paul

 > sessionInfo()
R version 2.11.0 (2010-04-22)
i486-pc-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] maptools_0.7-34 lattice_0.18-5  sp_0.9-62       foreign_0.8-40

loaded via a namespace (and not attached):
[1] grid_2.11.0

  
    
1 day later
#
On 06/11/2010 11:43 AM, Paul Hiemstra wrote:
More information on why this is so much faster can be found in the R 
Inferno Chapter 2, this gives a good description of what goes wrong in 
the background when growing an R object.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

The following basic R code also provides some info:

# Just letting the object grow
# without preallocation
meth1 = function(n) {
   vec <- numeric(0);
   for(i in 1:n) vec <- c(vec, i)
}

# preallocation
meth2 = function(n) {
   vec <- numeric(n)
   for(i in 1:n) vec[i] <- i
}

# Direct creation
# Not relevant for the
# above case
meth3 = function(n) {
   vec <- 1:n
}

# The suggestion I did
# with do.call and lapply
meth4 = function(n) {
   do.call("c", lapply(1:n, function(x) x))
}

n = 100000
# Speed test
system.time(meth1(n)) # Slooooow
system.time(meth2(n))
system.time(meth3(n))
system.time(meth4(n))

cheers,
Paul

  
    
1 day later
#
long = q[,1]
lat  = q[,2]
On 06/14/2010 05:05 AM, Thiago Veloso wrote: