Greetings,
I would like to assign spatial points in a large dataset to an area
defined by complex polygon's. These are 'statistical reference' areas
and are helpful in indexing my data. Currently I have 40,000 spatial
observations (lats and longs) and 42 named polygons.
I've found a way to do this with a for loop and using inout from
splancs. This is however quite slow and I am curious if anyone would
know of a way to speed this up.
Thank you very much for your help and ideas!
############
library(splancs)
group<-list(one=data.frame(x=c(-69,-69,-70,-70,-69),
y=c(39.8333,39,39,39.8333,39.8333)),
two=data.frame(x=c(-68.8333,-66.6667,-66.6667,-69,-69,-68.8333),
y=c(39.8333,39.8333,39,39,39.8333,39.8333)),
three
=
data
.frame
(x=c(-65.75,-65.74,-65.6667,-65.6667,66.6667,66.6667,66.6667,-65.75),
y=c(40.5,40.5,40.5,39,39,39.8333,40.5,40.5)))
set.seed(50)
dat<-data.frame(x=runif(100,-70,-65), y=runif(100,39,41), area=NA)
for(i in 1:length(dat$x)){
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$one)){dat$area[i]<-1}}
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$two)){dat$area[i]<-2}}
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$three)){dat$area[i]<-3}}
}
dat$area
# [1] NA NA NA NA 2 NA NA NA NA 1 NA 2 2 1 2 NA NA NA NA NA NA
1 NA NA NA
#[26] NA 2 2 2 2 2 2 NA NA 2 2 NA NA 2 NA NA NA 2 1 1 NA
3 NA NA NA
#[51] 3 NA 2 NA NA NA NA 2 NA 2 2 2 NA NA NA NA NA 1 NA 3 NA
NA 2 1 3
#[76] 1 2 NA NA NA NA 1 NA NA NA NA NA NA NA 2 NA NA NA NA NA 2
1 NA NA NA
######################
Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloehrke at umassd.edu
sessionInfo()
R version 2.8.0 Patched (2008-11-24 r47017)
i386-apple-darwin9.5.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] splancs_2.01-24 sp_0.9-28
loaded via a namespace (and not attached):
[1] grid_2.8.0 lattice_0.17-17
in or out of polygon efficiency question
2 messages · Jon Loehrke, Roger Bivand
On Fri, 28 Nov 2008, Jon Loehrke wrote:
Greetings, I would like to assign spatial points in a large dataset to an area defined by complex polygon's. These are 'statistical reference' areas and are helpful in indexing my data. Currently I have 40,000 spatial observations (lats and longs) and 42 named polygons. I've found a way to do this with a for loop and using inout from splancs. This is however quite slow and I am curious if anyone would know of a way to speed this up.
Use the overlay method in the sp package - the polygons must be a
SpatialPolygons object, the points a SpatialPoints (or their *DataFrame
subclasses). Both are easy to read in using readOGR in rgdal. Using your
data:
coordinates(dat) <- c("x", "y")
group1 <- lapply(seq(along=group), function(x)
Polygons(list(Polygon(as.matrix(group[[x]]))), ID=x))
group2 <- SpatialPolygons(group1)
o <- overlay(group2, dat)
plot(dat)
plot(group2, add=TRUE, border="green")
points(dat, col=o)
dat$area <- o
summary(dat)
If you want the names in group as the IDs, use lapply(names(group), ...
then:
IDs <- sapply(slot(group2, "polygons"), function(x) slot(x, "ID"))
dat$area <- factor(IDs[o])
summary(dat)
For 40K points in dat, this ran in less than half a second for me, but for
irregular polygons would take somewhat longer, because there is much more
computational geometry to do.
Hope this helps,
Roger
Thank you very much for your help and ideas!
############
library(splancs)
group<-list(one=data.frame(x=c(-69,-69,-70,-70,-69),
y=c(39.8333,39,39,39.8333,39.8333)),
two=data.frame(x=c(-68.8333,-66.6667,-66.6667,-69,-69,-68.8333),
y=c(39.8333,39.8333,39,39,39.8333,39.8333)),
three=data.frame(x=c(-65.75,-65.74,-65.6667,-65.6667,66.6667,66.6667,66.6667,-65.75),
y=c(40.5,40.5,40.5,39,39,39.8333,40.5,40.5)))
set.seed(50)
dat<-data.frame(x=runif(100,-70,-65), y=runif(100,39,41), area=NA)
for(i in 1:length(dat$x)){
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$one)){dat$area[i]<-1}}
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$two)){dat$area[i]<-2}}
if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else
{if(inout(cbind(dat$x[i],dat$y[i]),group$three)){dat$area[i]<-3}}
}
dat$area
# [1] NA NA NA NA 2 NA NA NA NA 1 NA 2 2 1 2 NA NA NA NA NA NA 1 NA NA
NA
#[26] NA 2 2 2 2 2 2 NA NA 2 2 NA NA 2 NA NA NA 2 1 1 NA 3 NA NA
NA
#[51] 3 NA 2 NA NA NA NA 2 NA 2 2 2 NA NA NA NA NA 1 NA 3 NA NA 2 1
3
#[76] 1 2 NA NA NA NA 1 NA NA NA NA NA NA NA 2 NA NA NA NA NA 2 1 NA NA
NA
######################
Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloehrke at umassd.edu
sessionInfo()
R version 2.8.0 Patched (2008-11-24 r47017)
i386-apple-darwin9.5.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] splancs_2.01-24 sp_0.9-28
loaded via a namespace (and not attached):
[1] grid_2.8.0 lattice_0.17-17
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no