sorting without order
On Tue, 23 Nov 2004, Peter Dalgaard wrote:
"Marc Mamin" <M.Mamin at intershop.de> writes:
Hello, In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values. I'm not interesting in having the values sorted, but only grouped. I also need the equivalent of index.return from the standard "sort" function: f(c(10,1,10,100,1,10)) => grouped: c(10,10,10,1,1,100) ix: c(1,3,6,2,5,4) is there a way to achieve this which would be faster than the standard sort function? Thanks for any hints,
Here's one way:
v <- c(10,1,10,100,1,10)
ix <- do.call("c",split(seq(along=v),v))
grouped <- v[ix]
Not sure about the speed though. Should be O(N) if the number of
groups is small, but the multiplier could be large because of various
formalities (such as adding names to ix).
Radix sorting as implemented in sort.list will be hard to beat: ix <- sort.list(unclass(factor(v)), method="radix") although in your case as.integer() will do.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595