Hi everyone, I want to perform regionalisation (i.e. spatially constrained clustering) of a raster grid. The raster rectangle has ~90,000 pixels. In it, I am only interested in a region within a specific, somewhat convoluted, shape (i.e. the sea pixels, excluding land) which represents ~35,000 pixels. I have been able to convert pixels to polygons and remove a few odd bits in order to compute a neighbourhood list representing one fully connected graph, with spdep::poly2nb (there is probably a more clever way for a large regular grid but this works reasonably quickly). From it, I have computed the weights (based on 46 variables measured in each pixel) and the minimum spanning tree. And then I fed this MST and data to spdep::skater, and asked to resolve 7 clusters (number based on a priori knowledge). It's been running for ~10 hours on 12 cores (on Xeon CPUs at 3.40GHz), each eating up to 5GB of RAM, and I have no idea if it is any close to finishing. Eventually, I'll want to resolve from 5 to 12 clusters and compute some a posteriori metrics to decide on the ideal number of clusters. I can dedicate a few more cores but that will not be enough to speed it up significantly. Is there any clever way to speed spdep::skater up, that would maybe exploit the fact that I am working on a regular grid? I've thought about computing 100 to 200 clusters using regular k-means (or pam) and then consider those as input polygons to skater but many clusters end up as pixels scattered all over the place. I've considered adding lat and lon to the data fed to the k-means at this preliminary step to force spatial contiguity, but that becomes a bit difficult to justify cleanly in a methods section of a paper, and does not really work (regions are still scattered locally). Of course I could reduce the resolution of the original data but that would be a shame. Finally, I've thought about: 1- run skater on low resolution data (few, large pixels) 2- group the central (large) pixels of each region as a polygon and break appart pixels on the border into smaller pixels 3- compute average characteristics on these new pixels 4- re-run skater with large central polygons intact and smaller pixel-polygons on the borders and repeat this until the borders are well defined But this involves quite a bit of coding and I am not really sure how representative the mean characteristics would be for each large area. Before embarking on this, I wanted to check whether another solution existed. Does anyone have experience with ClusterPy http://www.rise-group.org/risem/clusterpy/, especially in terms of speed? And in terms of which algorithm resembles skater most? or is most robust? (I understand the concepts behind skater, I'm not confident with the others). Thank you in advance. Sincerely, Jean-Olivier Irisson ? Universit? Pierre et Marie Curie Laboratoire d'Oc?anographie de Villefranche 2 Quai de la Corderie, 06230 Villefranche-sur-Mer Tel: +33 04 93 76 38 04 Mob: +33 06 21 05 19 90 http://www.obs-vlfr.fr/~irisson/ Send me large files at: http://www.obs-vlfr.fr/~irisson/upload/
Regionalisation of a large raster (a way to speed up spdep::skater?)
1 message · Jo Frabetti