Dear list,
I am processing gridded data like DEMs, imported in R from geotiff
files. Of course, such data is regularly gridded, but very often with
NA values. In the framwork of some tests, I have to generate a
neighbourhood of the DEM, so that to extract local extrema of the
layer. For this task, I wrote a function based on ?the knearneig()
function from the spdep package, with k=4 or k=8, to generate and
analyse the neighbourhood of each point.
Unfortunately, I often have to process big datasets (let's say grids
with between ?50,000 and 350,000 points), and that's working but
knearneigh() takes *hours* to process.
Does anybody would have any suggestion to improve the efficiency of this
step ?
the algorithm (knn.c file in the spdep sources) seems to compute all
distances between all grid cells, and for each grid cell find the nearest k
by some kind of insertion sort. That makes it general for all kinds of
spatial data points, including irregularly spread ones, but makes it also
inefficient (because O(n^2) with n the number of grid cells) for large data
sets.
Suggestions to improve this would be to use an index structure (for the
generic case of possibly irregularly spread data) such as the PR-bucket
quadtree used in package gstat (nsearch.c), or specifically for gridded data
to write a special knearneigh() that exploits the grid indexing already
present: you can explicitly address the nearest 4/8/12/... neighbours by
using row/col indexes. In this latter case you'd probably want to rewrite
the test function for the full grid, instead of looping over the cells.