Skip to content
Prev 2926 / 29559 Next

point.in.polygon() on massive datasets

Dear all,
I have a dataset of about 50 million lat/lon coordinates each of which falls into one of 550 polygons.
I need to assign their memberships and have used point.in.polygon() for that purpose.
However, the simple way of looping over the 50 million points clearly takes a looong time; 1 million points took about 3-4 days on a fast Linux server with lots of memory.
Am I overlooking obvious ways of making this massive computation more efficient ? Would R trees help ?
Should I try to compile the C code for point.in.polygon() (available from gstat) and run it outside R as a standalone executable ?
I am already using apply() to mitigate the inefficiency of the for loop in R.

Any help would be greatly appreciated,

Thanks,

Markus