"TL" == Thomas Lumley <tlumley@u.washington.edu> writes:
TL> On Fri, 27 Apr 2001, Philippe Grosjean wrote:
>> I suspect it should be a question of algorithm choice. May be sort() is
>> important enough, including for large datasets, to place some improvement in
>> the "to do" list for a further version...?
>>
TL> Actually, I'm not convinced on the speed issue. Shellsort works
TL> very well for vectors of lengths that R can reasonably
TL> handle. Using a million record dataset in R you really can't get
TL> much done in the 20seconds that a really good sorting routine might
TL> save you.
TL> Heapsort might be worth trying -- it has good worst-case behaviour
TL> and is about half as fast as the typical case of quicksort. Merge
TL> sort might also be nice because it's stable: it leaves ties in the
TL> order it found them, but it needs extra space. Still, I don't think
TL> it's a high optimisation priority.
I tend to disagree.
For me, R is not just for data analysis.
It's my compute engine for almost everything, nowadays.
I definitely have wanted to add better sort() routines to R,
and I think R's default should be to use some O(n log(n)) versions, at
least for larghish n.
20 years ago (or so), ``the'' good thing
{{for in-memory sorting; note that we might consider a different algorithm
for the case of a really large array where [i] is really disk access}}
was Singleton's Quickersort
| R.C. Singleton 'Revised Quickersort'
| (CACM Algorithm 347, Sept.1968)
(quicker than quicksort), a version of which (that does additionally return
the sorted index) I had re-programmed in Pascal and then C.
I can put it up for ftp if there's interest.
The GSL (GNU scientific library) has sorting routines, too,
all of them heapsort.
From an outdated manual,
http://sources.redhat.com/gsl/ref/gsl-ref_17.html#SEC253
Sorting
This chapter describes functions for sorting data, both directly and indirectly (using an index). All the functions use the heapsort algorithm. Heapsort is an O(N \log N) algorithm which operates in-place. It does not require any additional storage and provides consistent performance. The running time for its worst-case (ordered data) which is not significantly different from the average and best cases. Note that the heapsort algorithm does not preserve the relative ordering of equal elements -- it is an unstable sort. However the resulting order of equal elements will be consistent across different platforms when using these functions.
Sorting objects The following function provides a simple alternative to the standard library function qsort. It is intended for systems lacking qsort, not as a replacement for it. The function qsort should be used whenever possible, as it will be faster and can provide stable ordering of equal elements. Documentation for qsort is available in the GNU C Library Reference Manual.
I.e., they seem to say one should really use qsort() whenever that is available... hmm, could we use configure to find if qsort() is available? Martin -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._