Finding the county shapefile polygon closest to a long-lat position

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20090521/2946d74f/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20090521/ff3c7763/attachment.pl>
Hi,

If you have your data loaded into R as spatial objects, you can find in 
which polygons the points fall using the overlay() function from the 
sp-pacakge. See the documentation there. I'm not sure what to do if the 
point falls outside a polygon and you need the nearest.

cheers,
Paul

BRWIN338 at aol.com schreef:
Greetings
I have a large number of long-lat locations dispersed over the  US and need 

to identify which US county that each point is located in  (or nearest to). 

After reading the past posts  and Roger's book, I have been able to use the 
overlay function  
to  identify the appropriate counties for the set of  locations 
with  long-lats lying within or on a polygon boundary.   However, due to  
longlat precision errors (I am assuming), some of the  points lie  outside 
all of 
my shapefile's county polygon  boundaries.   

Is there an R function similar to "overlay" that I could use to find  which 

county polygon is closest to each of my longlat points that do  not lie  
within the shapefile's polygons?  I have spent quite a  bit of time  
searching 
and browsing past list discussions and can seem  to find my  answer.

My apologies if I have missed an obvious  answer.

Joe          

**************Huge savings on HDTVs from Dell.com! 
(http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=http:%2F%2Fad.doubleclick.ne
t%2Fclk%3B215073686%3B37034322%3Bb)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

Dear Joe,
Maybe you could use a similar process that Marcelino suggested to me 
yesterday in order to compute the shortest distance from points to a 
polyline. You need to 1)convert your point data to ppp object in 
spatstat, 2)coerce your polygon data to a psp object in spatstat. 
3)nncross between the two.
You'll get "dist" - the nearest neighbour distance between the two 
patterns and "which" the nearest neighbour index of the second pattern. 
Maybe it won't be so straightforward to know which ids of the coerced 
polyline correspond to each polygon but I am not sure about this point.
Thanks, Pilar!

The only possible weakness in this might be that the input points and 
polygon boundaries are in geographical, not projected (planar) 
coordinates, so if there are differences in distances on the plane and 
great circle distances, they might lead to the "outside" points being 
assigned to the wrong county.

Depending on how many there are, you might consider two alternatives, one 
visual inspection in Google Earth or similar (export the "outside" points 
with IDs as one KML, and the county boundaries you are using with 
writeOGR() in rgdal (or use those built into GE)); the other would be to 
use the closeness to the label points of the counties of the "outside" 
points - coordinates() of the SpatialPolygons* object retrieves these, so 
looping over the "outside" points in spDistsN1() in sp with longlat=TRUE 
would give the great circle distances.

Of course, the underlying issue is why they are outside - is this a datum 
shift problem (counties in NAD27 and points in NAD83/WGS84)? If you fixed 
that, they would match better.

Hope this helps,

Roger
Best regards,
Pilar

-----Mensaje original-----
De: r-sig-geo-bounces at stat.math.ethz.ch en nombre de BRWIN338 at aol.com
Enviado el: jue 21/05/2009 21:14
Para: r-sig-geo at stat.math.ethz.ch
Asunto: [R-sig-Geo] Finding the county shapefile polygon closest to along-lat position

Greetings
I have a large number of long-lat locations dispersed over the  US and need

to identify which US county that each point is located in  (or nearest to).

After reading the past posts  and Roger's book, I have been able to use the
overlay function
to  identify the appropriate counties for the set of  locations
with  long-lats lying within or on a polygon boundary.   However, due to
longlat precision errors (I am assuming), some of the  points lie  outside
all of
my shapefile's county polygon  boundaries.

Is there an R function similar to "overlay" that I could use to find  which

county polygon is closest to each of my longlat points that do  not lie
within the shapefile's polygons?  I have spent quite a  bit of time
searching
and browsing past list discussions and can seem  to find my  answer.

My apologies if I have missed an obvious  answer.

Joe

**************Huge savings on HDTVs from Dell.com!
(http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=http:%2F%2Fad.doubleclick.ne
t%2Fclk%3B215073686%3B37034322%3Bb)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

La informaci?n contenida en este e-mail y sus ficheros adjuntos es totalmente confidencial y no deber?a ser usado si no fuera usted alguno de los destinatarios. Si ha recibido este e-mail por error, por favor avise al remitente y b?rrelo de su buz?n o de cualquier otro medio de almacenamiento.   This email is confidential and should not be used by anyone who is not the original intended  recipient. If you have received this e-mail in  error please inform the sender and delete it from  your mailbox or any other storage mechanism.
	[[alternative HTML version deleted]]

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
Hi Joe

Like Roger says you should first look at the possible datum problem. The 
example below shows one method to find the closest polygon, and to add that 
to the overlay. Notice however that if you change the projection and/or 
ellipsoid the closest points (end of lines) are different. Closest line in 
this case is calculated by:
abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2))^2)

where x0, and y0 is the point, while x1, y1, and x2, y2 are the line ends.

To find the two closest points, spDistsN1 is used. 

Try to play with un-commenting polys <- spTransform(polys, CRS("+proj=moll 
+ellps=WGS84"))
and changing ll.logical=FALSE/TRUE (FALSE = Euclidean, TRUE = Great Circle 
distance)

One of the things you will see is that the closest points are different when 
you calculate the distance in km (ll.logical=TRUE) combined with 
proj4string=CRS("+proj=moll +ellps=WGS84")

I hope this gives some insight in the problem. 

The code can also be downloaded from http://open.uib.no/R/mindist.R

I do not guarantee the code is bug free.

You could try to replace polys with your polygon, and ssample with your 
points, and see if it makes sense.

Best wishes
Torleif

#################

library(rgdal)

#Make data
set.seed(7)
grd <- GridTopology(c(1,1), c(1,1), c(2,2))
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS(" +proj=longlat +ellps=WGS84" )

## Try uncommenting this line, and see what are the closest points in the last 
figure.
## In this case the same closest polygons are found, but not the same two 
closest points
#polys <- spTransform(polys, CRS("+proj=moll +ellps=sphere"))

ssample <- spsample(polys,n=10,type="random")

#displace points
slot(ssample, 'coords') <- slot(ssample, 'coords')-mean( 
slot(ssample, 'coords'))*.25

#overlay
overl <- overlay(polys, ssample)

#Show overlay
overl

#Which points are outside?
wh <- which(is.na(overl))

#Make plots
plot(polys, xlim=(c(bbox(polys)[1,])-c(1, -1)), 
      ylim=(c(bbox(polys)[2,])-c(1, -1)))
for (i in 1:4) points(slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,1], slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,2], col = i)
points(ssample)
for (i in 1:length(wh)) {
points(slot(ssample, 'coords')[wh[i],1], slot(ssample, 'coords')[wh[i],2], 
col=(i+1))
}

# Number of polygons to sapply over (or number of polygons). 
# With more polygons you could also try to only scan the lines around 
# the "4" polygons which center is closest to the outlying point(?)
nit <- dim(coordinates(polys))[1]

#Find the closest polygon, and paste the data in overl
overl.na <- sapply(1:length(wh), function (x) {
  x0 <- slot(ssample, 'coords')[wh[x],1]
  y0 <- slot(ssample, 'coords')[wh[x],2]

mn <- sapply(1:nit, function(y) {
    cc <- slot(slot((slot(polys, 'polygons'))[[y]], 'Polygons')
[[1]], 'coords')
    x1 <- cc[1:(dim(cc)[1]-1),1]
    x2 <- cc[2:dim(cc)[1],1]
    y1 <- cc[1:(dim(cc)[1]-1),2]
    y2 <- cc[2:dim(cc)[1],2]
    #calculate distance
    mn <- 
mean(abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2)))
    return(mn)
  })

  wm <- which.min(mn)
  return(list(wm, x0, y0))
})

##Can leave out. This section is just for plotting lines
#lines. Shortest distance
lin  <- sapply(2:(length(wh)+1), function (x) {
    p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
    pol.pos <- unlist(overl.na[1,])
    pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
    dst <- spDistsN1(pol1, p1, longlat=FALSE)
    wh.dst <- which.min(dst)
    ln <- Line(matrix( c(p1[,1], pol1[wh.dst, 1], p1[,2], pol1[wh.dst, 2]), 
2 ))
    return(ln)
  })

#lines, second shortest distance
lin2  <- sapply(2:(length(wh)+1), function (x) {
    p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
    pol.pos <- unlist(overl.na[1,])
    pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
    dst <- spDistsN1(pol1, p1, longlat=FALSE)
    wh.dst <- which.min(dst)
    dst[wh.dst] <- NA
    wh.dst2 <- which.min(dst)
    ln2 <- Line(matrix( c(p1[,1], pol1[wh.dst2, 1], p1[,2], pol1[wh.dst2, 2]), 
2 ))
    return(ln2)
  })

#Convert to spatial lines
sl1 <- Lines(lin)
spl1 <- SpatialLines(list(sl1), proj4string = CRS(proj4string(polys)))
sl2 <- Lines(lin2)
spl2 <- SpatialLines(list(sl2), proj4string = CRS(proj4string(polys)))

##
#Add the new values to the overlay
overl[wh] <- unlist(overl.na[1,])

#Show overlay 
overl

#Add polygon
new.points <- SpatialPointsDataFrame(ssample, data.frame(polyid=overl))

polys <- SpatialPolygonsDataFrame(polys, 
data.frame(ID=c("p1", "p2", "p3", "p4")))

mypalette <- "red"
require(lattice)
trellis.par.set(sp.theme(regions = list(col = mypalette)))
spplot(new.points, 
      sp.layout = list(list("sp.polygons", polys, fill = 
c("grey10", "grey40", "grey60", "grey90")),
			list("sp.lines", spl1),
			list("sp.lines", spl2)), 
      xlim=(c(bbox(polys)[1,])-c(mean( slot(ssample, 'coords'))*.5, -mean( 
slot(ssample, 'coords'))*.5)), 
      ylim=(c(bbox(polys)[2,])-c(mean( slot(ssample, 'coords'))*.5, -mean( 
slot(ssample, 'coords'))*.5)))

##########
On Thu, 21 May 2009, Pilar Tugores Ferra wrote:
Dear Joe,
Maybe you could use a similar process that Marcelino suggested to me
yesterday in order to compute the shortest distance from points to a
polyline. You need to 1)convert your point data to ppp object in
spatstat, 2)coerce your polygon data to a psp object in spatstat.
3)nncross between the two.
You'll get "dist" - the nearest neighbour distance between the two
patterns and "which" the nearest neighbour index of the second pattern.
Maybe it won't be so straightforward to know which ids of the coerced
polyline correspond to each polygon but I am not sure about this point.
Thanks, Pilar!

The only possible weakness in this might be that the input points and
polygon boundaries are in geographical, not projected (planar)
coordinates, so if there are differences in distances on the plane and
great circle distances, they might lead to the "outside" points being
assigned to the wrong county.

Depending on how many there are, you might consider two alternatives, one
visual inspection in Google Earth or similar (export the "outside" points
with IDs as one KML, and the county boundaries you are using with
writeOGR() in rgdal (or use those built into GE)); the other would be to
use the closeness to the label points of the counties of the "outside"
points - coordinates() of the SpatialPolygons* object retrieves these, so
looping over the "outside" points in spDistsN1() in sp with longlat=TRUE
would give the great circle distances.

Of course, the underlying issue is why they are outside - is this a datum
shift problem (counties in NAD27 and points in NAD83/WGS84)? If you fixed
that, they would match better.

Hope this helps,

Roger

Best regards,
Pilar

-----Mensaje original-----
De: r-sig-geo-bounces at stat.math.ethz.ch en nombre de BRWIN338 at aol.com
Enviado el: jue 21/05/2009 21:14
Para: r-sig-geo at stat.math.ethz.ch
Asunto: [R-sig-Geo] Finding the county shapefile polygon closest to
along-lat position

Greetings
I have a large number of long-lat locations dispersed over the  US and
need

to identify which US county that each point is located in  (or nearest
to).

After reading the past posts  and Roger's book, I have been able to use
the overlay function
to  identify the appropriate counties for the set of  locations
with  long-lats lying within or on a polygon boundary.   However, due to
longlat precision errors (I am assuming), some of the  points lie 
outside all of
my shapefile's county polygon  boundaries.

Is there an R function similar to "overlay" that I could use to find 
which

county polygon is closest to each of my longlat points that do  not lie
within the shapefile's polygons?  I have spent quite a  bit of time
searching
and browsing past list discussions and can seem  to find my  answer.

My apologies if I have missed an obvious  answer.

Joe

**************Huge savings on HDTVs from Dell.com!
(http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=
http:%2F%2Fad.doubleclick.ne t%2Fclk%3B215073686%3B37034322%3Bb)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

La informaci?n contenida en este e-mail y sus ficheros adjuntos es
totalmente confidencial y no deber?a ser usado si no fuera usted alguno
de los destinatarios. Si ha recibido este e-mail por error, por favor
avise al remitente y b?rrelo de su buz?n o de cualquier otro medio de
almacenamiento.   This email is confidential and should not be used by
anyone who is not the original intended  recipient. If you have received
this e-mail in  error please inform the sender and delete it from  your
mailbox or any other storage mechanism. [[alternative HTML version
deleted]]