point.in.polygon() on massive datasets
Dear all, I have a dataset of about 50 million lat/lon coordinates each of which falls into one of 550 polygons. I need to assign their memberships and have used point.in.polygon() for that purpose. However, the simple way of looping over the 50 million points clearly takes a looong time; 1 million points took about 3-4 days on a fast Linux server with lots of memory. Am I overlooking obvious ways of making this massive computation more efficient ? Would R trees help ? Should I try to compile the C code for point.in.polygon() (available from gstat) and run it outside R as a standalone executable ? I am already using apply() to mitigate the inefficiency of the for loop in R. Any help would be greatly appreciated, Thanks, Markus