Natural Breaks - Jenks
On Tue, 18 Apr 2006, Ben Brehmer wrote:
I have implemented Jenks' algorithm (for finding the natural breaks) in php thanks to some sample code I found at: http://www.mail-archive.com/r-sig-geo at stat.math.ethz.ch/msg00290.html . This algorithm is also implemented in ArcView to determine natural breaks in the legends. Currently I am running the algorithm on a data set which has 65 000 elements in it, which takes over 3 hours (due to a nested for loop). ArcViews' implementation on the other hand returns within seconds. Would anyone possibly know why ArcViews implementation is so much more efficient.
library(classInt) ?classIntervals y <- runif(65000) yClass <- classIntervals(y, n=5, style="fisher") runs on a 1.5GHz machine in 225 seconds. This is using the Fortran code you refer to directly. My guess is that Arc looks at the number of unique values, and, if there are many, uses a heuristic. If it sampled and set the seed the same each time, the result would be the same, and the code runs acceptably fast for say 2000 values. Maybe Arc also precomputes values? Roger
Any help would be greatly appreciated. Ben Brehmer
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no