Benchmarking R, why sort() is so slow?
On Fri, 27 Apr 2001, Philippe Grosjean wrote:
I suspect it should be a question of algorithm choice. May be sort() is important enough, including for large datasets, to place some improvement in the "to do" list for a further version...?
I find it hard to believe that doing anything useful with a 1.1 million length vector will not take very much longer than complete sorting, which is a rare problem in statistics (not needed for quantiles, for example). That's the problem with these benchmarks, they don't test realistic problems because many of the packages concerned would need days of coding to run one. FWIW, there are O(n log n) sort algorithms, worst case, that are fairly competitive on average too. But testing random vectors is not representation of real problems, and for many such (and even for rnorm(1e6)) one can do even better by e.g radix sorting. So if you know something about the input you make your package look better. Again, a problem with benchmarking.
Peter Dalgaard
The internal algorithm is a shellsort, which is supposedly of complexity O(n^1.25) and has decent worst-case behaviour. Other algorithms like quicksort have typical performance of O(n log n) but extreme cases of O(n^2). For large vectors the O(n^.25/log(n)) relative complexity is going to make a difference, although the observed differences seem larger.
There are several shellsort variants. I would need to look up the known results for precisely this one.
There are very likely better choices of sort algorithm...
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._