An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20090521/2946d74f/attachment.pl>
Finding the county shapefile polygon closest to a long-lat position
5 messages · BRWIN338 at aol.com, Pilar Tugores Ferra, Paul Hiemstra +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20090521/ff3c7763/attachment.pl>
Hi, If you have your data loaded into R as spatial objects, you can find in which polygons the points fall using the overlay() function from the sp-pacakge. See the documentation there. I'm not sure what to do if the point falls outside a polygon and you need the nearest. cheers, Paul BRWIN338 at aol.com schreef:
Greetings I have a large number of long-lat locations dispersed over the US and need to identify which US county that each point is located in (or nearest to). After reading the past posts and Roger's book, I have been able to use the overlay function to identify the appropriate counties for the set of locations with long-lats lying within or on a polygon boundary. However, due to longlat precision errors (I am assuming), some of the points lie outside all of my shapefile's county polygon boundaries. Is there an R function similar to "overlay" that I could use to find which county polygon is closest to each of my longlat points that do not lie within the shapefile's polygons? I have spent quite a bit of time searching and browsing past list discussions and can seem to find my answer. My apologies if I have missed an obvious answer. Joe **************Huge savings on HDTVs from Dell.com! (http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=http:%2F%2Fad.doubleclick.ne t%2Fclk%3B215073686%3B37034322%3Bb) [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul
On Thu, 21 May 2009, Pilar Tugores Ferra wrote:
Dear Joe, Maybe you could use a similar process that Marcelino suggested to me yesterday in order to compute the shortest distance from points to a polyline. You need to 1)convert your point data to ppp object in spatstat, 2)coerce your polygon data to a psp object in spatstat. 3)nncross between the two. You'll get "dist" - the nearest neighbour distance between the two patterns and "which" the nearest neighbour index of the second pattern. Maybe it won't be so straightforward to know which ids of the coerced polyline correspond to each polygon but I am not sure about this point.
Thanks, Pilar! The only possible weakness in this might be that the input points and polygon boundaries are in geographical, not projected (planar) coordinates, so if there are differences in distances on the plane and great circle distances, they might lead to the "outside" points being assigned to the wrong county. Depending on how many there are, you might consider two alternatives, one visual inspection in Google Earth or similar (export the "outside" points with IDs as one KML, and the county boundaries you are using with writeOGR() in rgdal (or use those built into GE)); the other would be to use the closeness to the label points of the counties of the "outside" points - coordinates() of the SpatialPolygons* object retrieves these, so looping over the "outside" points in spDistsN1() in sp with longlat=TRUE would give the great circle distances. Of course, the underlying issue is why they are outside - is this a datum shift problem (counties in NAD27 and points in NAD83/WGS84)? If you fixed that, they would match better. Hope this helps, Roger
Best regards, Pilar -----Mensaje original----- De: r-sig-geo-bounces at stat.math.ethz.ch en nombre de BRWIN338 at aol.com Enviado el: jue 21/05/2009 21:14 Para: r-sig-geo at stat.math.ethz.ch Asunto: [R-sig-Geo] Finding the county shapefile polygon closest to along-lat position Greetings I have a large number of long-lat locations dispersed over the US and need to identify which US county that each point is located in (or nearest to). After reading the past posts and Roger's book, I have been able to use the overlay function to identify the appropriate counties for the set of locations with long-lats lying within or on a polygon boundary. However, due to longlat precision errors (I am assuming), some of the points lie outside all of my shapefile's county polygon boundaries. Is there an R function similar to "overlay" that I could use to find which county polygon is closest to each of my longlat points that do not lie within the shapefile's polygons? I have spent quite a bit of time searching and browsing past list discussions and can seem to find my answer. My apologies if I have missed an obvious answer. Joe **************Huge savings on HDTVs from Dell.com! (http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=http:%2F%2Fad.doubleclick.ne t%2Fclk%3B215073686%3B37034322%3Bb) [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo La informaci?n contenida en este e-mail y sus ficheros adjuntos es totalmente confidencial y no deber?a ser usado si no fuera usted alguno de los destinatarios. Si ha recibido este e-mail por error, por favor avise al remitente y b?rrelo de su buz?n o de cualquier otro medio de almacenamiento. This email is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. [[alternative HTML version deleted]]
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Hi Joe
Like Roger says you should first look at the possible datum problem. The
example below shows one method to find the closest polygon, and to add that
to the overlay. Notice however that if you change the projection and/or
ellipsoid the closest points (end of lines) are different. Closest line in
this case is calculated by:
abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2))^2)
where x0, and y0 is the point, while x1, y1, and x2, y2 are the line ends.
To find the two closest points, spDistsN1 is used.
Try to play with un-commenting polys <- spTransform(polys, CRS("+proj=moll
+ellps=WGS84"))
and changing ll.logical=FALSE/TRUE (FALSE = Euclidean, TRUE = Great Circle
distance)
One of the things you will see is that the closest points are different when
you calculate the distance in km (ll.logical=TRUE) combined with
proj4string=CRS("+proj=moll +ellps=WGS84")
I hope this gives some insight in the problem.
The code can also be downloaded from http://open.uib.no/R/mindist.R
I do not guarantee the code is bug free.
You could try to replace polys with your polygon, and ssample with your
points, and see if it makes sense.
Best wishes
Torleif
#################
library(rgdal)
#Make data
set.seed(7)
grd <- GridTopology(c(1,1), c(1,1), c(2,2))
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS(" +proj=longlat +ellps=WGS84" )
## Try uncommenting this line, and see what are the closest points in the last
figure.
## In this case the same closest polygons are found, but not the same two
closest points
#polys <- spTransform(polys, CRS("+proj=moll +ellps=sphere"))
ssample <- spsample(polys,n=10,type="random")
#displace points
slot(ssample, 'coords') <- slot(ssample, 'coords')-mean(
slot(ssample, 'coords'))*.25
#overlay
overl <- overlay(polys, ssample)
#Show overlay
overl
#Which points are outside?
wh <- which(is.na(overl))
#Make plots
plot(polys, xlim=(c(bbox(polys)[1,])-c(1, -1)),
ylim=(c(bbox(polys)[2,])-c(1, -1)))
for (i in 1:4) points(slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,1], slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,2], col = i)
points(ssample)
for (i in 1:length(wh)) {
points(slot(ssample, 'coords')[wh[i],1], slot(ssample, 'coords')[wh[i],2],
col=(i+1))
}
# Number of polygons to sapply over (or number of polygons).
# With more polygons you could also try to only scan the lines around
# the "4" polygons which center is closest to the outlying point(?)
nit <- dim(coordinates(polys))[1]
#Find the closest polygon, and paste the data in overl
overl.na <- sapply(1:length(wh), function (x) {
x0 <- slot(ssample, 'coords')[wh[x],1]
y0 <- slot(ssample, 'coords')[wh[x],2]
mn <- sapply(1:nit, function(y) {
cc <- slot(slot((slot(polys, 'polygons'))[[y]], 'Polygons')
[[1]], 'coords')
x1 <- cc[1:(dim(cc)[1]-1),1]
x2 <- cc[2:dim(cc)[1],1]
y1 <- cc[1:(dim(cc)[1]-1),2]
y2 <- cc[2:dim(cc)[1],2]
#calculate distance
mn <-
mean(abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2)))
return(mn)
})
wm <- which.min(mn)
return(list(wm, x0, y0))
})
##Can leave out. This section is just for plotting lines
#lines. Shortest distance
lin <- sapply(2:(length(wh)+1), function (x) {
p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
pol.pos <- unlist(overl.na[1,])
pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
dst <- spDistsN1(pol1, p1, longlat=FALSE)
wh.dst <- which.min(dst)
ln <- Line(matrix( c(p1[,1], pol1[wh.dst, 1], p1[,2], pol1[wh.dst, 2]),
2 ))
return(ln)
})
#lines, second shortest distance
lin2 <- sapply(2:(length(wh)+1), function (x) {
p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
pol.pos <- unlist(overl.na[1,])
pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
dst <- spDistsN1(pol1, p1, longlat=FALSE)
wh.dst <- which.min(dst)
dst[wh.dst] <- NA
wh.dst2 <- which.min(dst)
ln2 <- Line(matrix( c(p1[,1], pol1[wh.dst2, 1], p1[,2], pol1[wh.dst2, 2]),
2 ))
return(ln2)
})
#Convert to spatial lines
sl1 <- Lines(lin)
spl1 <- SpatialLines(list(sl1), proj4string = CRS(proj4string(polys)))
sl2 <- Lines(lin2)
spl2 <- SpatialLines(list(sl2), proj4string = CRS(proj4string(polys)))
##
#Add the new values to the overlay
overl[wh] <- unlist(overl.na[1,])
#Show overlay
overl
#Add polygon
new.points <- SpatialPointsDataFrame(ssample, data.frame(polyid=overl))
polys <- SpatialPolygonsDataFrame(polys,
data.frame(ID=c("p1", "p2", "p3", "p4")))
mypalette <- "red"
require(lattice)
trellis.par.set(sp.theme(regions = list(col = mypalette)))
spplot(new.points,
sp.layout = list(list("sp.polygons", polys, fill =
c("grey10", "grey40", "grey60", "grey90")),
list("sp.lines", spl1),
list("sp.lines", spl2)),
xlim=(c(bbox(polys)[1,])-c(mean( slot(ssample, 'coords'))*.5, -mean(
slot(ssample, 'coords'))*.5)),
ylim=(c(bbox(polys)[2,])-c(mean( slot(ssample, 'coords'))*.5, -mean(
slot(ssample, 'coords'))*.5)))
##########
On Friday 22 May 2009 10:44:38 am Roger Bivand wrote:
On Thu, 21 May 2009, Pilar Tugores Ferra wrote:
Dear Joe, Maybe you could use a similar process that Marcelino suggested to me yesterday in order to compute the shortest distance from points to a polyline. You need to 1)convert your point data to ppp object in spatstat, 2)coerce your polygon data to a psp object in spatstat. 3)nncross between the two. You'll get "dist" - the nearest neighbour distance between the two patterns and "which" the nearest neighbour index of the second pattern. Maybe it won't be so straightforward to know which ids of the coerced polyline correspond to each polygon but I am not sure about this point.
Thanks, Pilar! The only possible weakness in this might be that the input points and polygon boundaries are in geographical, not projected (planar) coordinates, so if there are differences in distances on the plane and great circle distances, they might lead to the "outside" points being assigned to the wrong county. Depending on how many there are, you might consider two alternatives, one visual inspection in Google Earth or similar (export the "outside" points with IDs as one KML, and the county boundaries you are using with writeOGR() in rgdal (or use those built into GE)); the other would be to use the closeness to the label points of the counties of the "outside" points - coordinates() of the SpatialPolygons* object retrieves these, so looping over the "outside" points in spDistsN1() in sp with longlat=TRUE would give the great circle distances. Of course, the underlying issue is why they are outside - is this a datum shift problem (counties in NAD27 and points in NAD83/WGS84)? If you fixed that, they would match better. Hope this helps, Roger
Best regards, Pilar -----Mensaje original----- De: r-sig-geo-bounces at stat.math.ethz.ch en nombre de BRWIN338 at aol.com Enviado el: jue 21/05/2009 21:14 Para: r-sig-geo at stat.math.ethz.ch Asunto: [R-sig-Geo] Finding the county shapefile polygon closest to along-lat position Greetings I have a large number of long-lat locations dispersed over the US and need to identify which US county that each point is located in (or nearest to). After reading the past posts and Roger's book, I have been able to use the overlay function to identify the appropriate counties for the set of locations with long-lats lying within or on a polygon boundary. However, due to longlat precision errors (I am assuming), some of the points lie outside all of my shapefile's county polygon boundaries. Is there an R function similar to "overlay" that I could use to find which county polygon is closest to each of my longlat points that do not lie within the shapefile's polygons? I have spent quite a bit of time searching and browsing past list discussions and can seem to find my answer. My apologies if I have missed an obvious answer. Joe **************Huge savings on HDTVs from Dell.com! (http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir= http:%2F%2Fad.doubleclick.ne t%2Fclk%3B215073686%3B37034322%3Bb) [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo La informaci?n contenida en este e-mail y sus ficheros adjuntos es totalmente confidencial y no deber?a ser usado si no fuera usted alguno de los destinatarios. Si ha recibido este e-mail por error, por favor avise al remitente y b?rrelo de su buz?n o de cualquier otro medio de almacenamiento. This email is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. [[alternative HTML version deleted]]