Skip to content
Prev 14341 / 29559 Next

speed problem with %over% function

Thanks to Francis and Edzer for their responses.

I been working on my problem for the last 2 days and I managed to get some very interesting information.  I also prepared a reproducible example so people can try it.

I've found 3 functions that do the thing I want:
1) %over%
2) overlay()  (which is deprecated)
3) gIntersection()  (from the rgeos package)

So I've tried them all on my data to compare the speed between each of them and ArcGIS.  The punchline of my analysis is that %over% is the slowest (and by a lot) of all functions including ArcGIS.  overlay() is the fastest.

To get these results, here is a reproducible example:

### first, download the shapes from:
### http://www.nceas.ucsb.edu/files/scicomp/demo/read-write-shapefiles.zip
### This is not my files, I took them from the example at:
### http://www.nceas.ucsb.edu/scicomp/usecases/ReadWriteESRIShapeFiles

###  Use the rgdal library to read the files
OGR data source with driver: ESRI Shapefile
Source: "C:\Documents and Settings\ferba1\Bureau\shape", layer: "nw-counties"
with 208 features and 8 fields
Feature type: wkbPolygon with 2 dimensions

# choose only one territory to make the analysis easier
# this code is used to transfer km in degre-decimals. It's not necessary to
# understand it for this problem.  The only important part is that you can
# change "separation" to increase the density of the grid. "separation"
# represent the distance between the grid point, in that example 0.5km.
###

# make the grid
+                             cellsize=c(nb.degre.lon,nb.degre.lat),
+                             cells.dim=(ceiling((etendu[,2]-etendu[,1])/c(nb.degre.lon,nb.degre.lat))+2))
### The overlay() function
user  system elapsed
   2.33    0.00    2.34
###  The %over% function
user  system elapsed
 953.72    2.33  960.80
###  The gIntersection() function
library(rgeos)
user  system elapsed
  37.83    0.05   37.99
###  end of code

As you can see, the %over% function is the slowest with 961 seconds to run.  The overlay() function was the best with 2.34 seconds to run.  The gIntersection was in the middle with 38 seconds to run.  Finally, the intersection tool of ArcGIS took 20 seconds to do the same thing.  Those times seem to be context specific, as in my problem, gIntersection() was the slowest with 799 seconds, while %over% did the task in 250 seconds and overlay() in 119.  overlay() was still the fastest function.

So, it seems there clearly is room for amelioration of the %over% function.  And my next question is, why is overlay() deprecated? In the help, it says: "This function is deprecated, as it has some inconsistences." but I did not find any reference on what are those "inconsistences".  Going from overlay() to %over% seems more like a downgrade than a upgrade to me.  I implemented overlay() in my code instead of %over%, saving an incredible amount of time. However I'm a little stressed about it as it may cause problems or I may lose the function in a future version of the sp package.

So I guess anybody motivated to ameliorate the %over% function could start by understanding the difference between it and the overlay() function.  Sadly, I'm no programmer so I can' help here.

Best regards,
Bastien Ferland-Raymond


Date: Tue, 28 Feb 2012 11:38:04 -0500
From: Francis Markham <francis.markham at anu.edu.au>

I've had examples in the past where using %over% from the sp package takes
all available RAM (7GB) and several hours, while ArcGIS takes about 300MB
and 5 minutes, so I would agree that there is plenty of room for
improvement here. I'll try to give a reproducable example in the coming
weeks when I return from travel.

This is a critical issue for me insofar as spatial joins are a routine
procedure for me and but without reasonable speed for producedures such as
this I cannot perform all my analysis in R, making for a more complex,
error prone work flow and scuttling the possibility of "reproducable
research."

On a related note, does the 'sp' package have an accessible bug tracker?
I'd like to be able to contribute to improving this very useful package if
at all possible, but I'm not sure where to begin.

Warm regards,

Francis Markham
Research Associate
Fenner School of Environment and Society
Australian National University
On 27 February 2012 19:18, Edzer Pebesma <edzer.pebesma at uni-muenster.de>wrote: