Skip to content

in or out of polygon efficiency question

2 messages · Jon Loehrke, Roger Bivand

#
Greetings,

I would like to assign spatial points in a large dataset to an area  
defined by complex polygon's.  These are 'statistical reference' areas  
and are helpful in indexing my data.  Currently I have 40,000 spatial  
observations (lats and longs) and 42 named polygons.

I've found a way to do this with a for loop and using inout from  
splancs.  This is however quite slow and I am curious if anyone would  
know of a way to speed this up.

Thank you very much for your help and ideas!

############
library(splancs)

group<-list(one=data.frame(x=c(-69,-69,-70,-70,-69),  
y=c(39.8333,39,39,39.8333,39.8333)),
			two=data.frame(x=c(-68.8333,-66.6667,-66.6667,-69,-69,-68.8333),  
y=c(39.8333,39.8333,39,39,39.8333,39.8333)),
			 
three 
= 
data 
.frame 
(x=c(-65.75,-65.74,-65.6667,-65.6667,66.6667,66.6667,66.6667,-65.75),  
y=c(40.5,40.5,40.5,39,39,39.8333,40.5,40.5)))

set.seed(50)

dat<-data.frame(x=runif(100,-70,-65), y=runif(100,39,41), area=NA)

for(i in 1:length(dat$x)){
	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else  
{if(inout(cbind(dat$x[i],dat$y[i]),group$one)){dat$area[i]<-1}}
	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else  
{if(inout(cbind(dat$x[i],dat$y[i]),group$two)){dat$area[i]<-2}}
	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else  
{if(inout(cbind(dat$x[i],dat$y[i]),group$three)){dat$area[i]<-3}}
}

dat$area

# [1] NA NA NA NA  2 NA NA NA NA  1 NA  2  2  1  2 NA NA NA NA NA NA   
1 NA NA NA
#[26] NA  2  2  2  2  2  2 NA NA  2  2 NA NA  2 NA NA NA  2  1  1 NA   
3 NA NA NA
#[51]  3 NA  2 NA NA NA NA  2 NA  2  2  2 NA NA NA NA NA  1 NA  3 NA  
NA  2  1  3
#[76]  1  2 NA NA NA NA  1 NA NA NA NA NA NA NA  2 NA NA NA NA NA  2   
1 NA NA NA
######################


Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloehrke at umassd.edu

sessionInfo()
R version 2.8.0 Patched (2008-11-24 r47017)
i386-apple-darwin9.5.0

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] splancs_2.01-24 sp_0.9-28

loaded via a namespace (and not attached):
[1] grid_2.8.0      lattice_0.17-17
#
On Fri, 28 Nov 2008, Jon Loehrke wrote:

            
Use the overlay method in the sp package - the polygons must be a 
SpatialPolygons object, the points a SpatialPoints (or their *DataFrame 
subclasses). Both are easy to read in using readOGR in rgdal. Using your 
data:

coordinates(dat) <- c("x", "y")
group1 <- lapply(seq(along=group), function(x)
   Polygons(list(Polygon(as.matrix(group[[x]]))), ID=x))
group2 <- SpatialPolygons(group1)
o <- overlay(group2, dat)
plot(dat)
plot(group2, add=TRUE, border="green")
points(dat, col=o)
dat$area <- o
summary(dat)

If you want the names in group as the IDs, use lapply(names(group), ... 
then:

IDs <- sapply(slot(group2, "polygons"), function(x) slot(x, "ID"))
dat$area <- factor(IDs[o])
summary(dat)

For 40K points in dat, this ran in less than half a second for me, but for 
irregular polygons would take somewhat longer, because there is much more 
computational geometry to do.

Hope this helps,

Roger