knearneigh() computing efficiency on large datasets

Dear Edzer and Robert,

Thank you for your replies.
Suggestions to improve this would be to use an index structure (for the
generic case of possibly irregularly spread data) such as the PR-bucket
quadtree used in package gstat (nsearch.c).
Yes, I thought about that, as I want to process both regularly and
irregularly gridded data. Unfortunately, my C experience is a bit
short for that. I'll keep looking for something - if I don't find
anything I'd try to code that myself.
You can try one of the 'focal' functions of the raster package on
R-Forge. Not particularly fast either, but probably much faster that
what you have now.
Thanks Robert (and congrats for such a great package BTW), I installed
the last version and will certainly test it, but as I said I would
like to process also ungridded data, so I'm still thinking about a
more general solution.

I was wondering if it could be a solution to use RSAGA or spgrass6 to
do that. Feedback, anyone ?

Cheers,

Pierre

2010/1/29 Edzer Pebesma <edzer.pebesma at uni-muenster.de>:
Pierre Roudier wrote:
Dear list,

I am processing gridded data like DEMs, imported in R from geotiff
files. Of course, such data is regularly gridded, but very often with
NA values. In the framwork of some tests, I have to generate a
neighbourhood of the DEM, so that to extract local extrema of the
layer. For this task, I wrote a function based on ?the knearneig()
function from the spdep package, with k=4 or k=8, to generate and
analyse the neighbourhood of each point.

Unfortunately, I often have to process big datasets (let's say grids
with between ?50,000 and 350,000 points), and that's working but
knearneigh() takes *hours* to process.

Does anybody would have any suggestion to improve the efficiency of this
step ?

the algorithm (knn.c file in the spdep sources) seems to compute all
distances between all grid cells, and for each grid cell find the nearest k
by some kind of insertion sort. That makes it general for all kinds of
spatial data points, including irregularly spread ones, but makes it also
inefficient (because O(n^2) with n the number of grid cells) for large data
sets.

Suggestions to improve this would be to use an index structure (for the
generic case of possibly irregularly spread data) such as the PR-bucket
quadtree used in package gstat (nsearch.c), or specifically for gridded data
to write a special knearneigh() that exploits the grid indexing already
present: you can explicitly address the nearest 4/8/12/... neighbours by
using row/col indexes. In this latter case you'd probably want to rewrite
the test function for the full grid, instead of looping over the cells.
Thanks,

Pierre

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e
253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763
?http://ifgi.uni-muenster.de http://www.52north.org/geostatistics
?e.pebesma at wwu.de

knearneigh() computing efficiency on large datasets

Thread (2 messages)