Back to formatted view
Raw Message

Message-ID: <x21xek3i93.fsf@biostat.ku.dk>
Date: 2004-11-23T11:41:12Z
From: Peter Dalgaard
Subject: sorting without order
In-Reply-To: <A03188C6623C0D46A703CB5AA59907F201C11C59@JENMAIL01.ad.intershop.net>

"Marc Mamin" <M.Mamin at intershop.de> writes:

> Hello,
> 
> 
> In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values. 
> I'm not interesting in having the values sorted, but only grouped.
> I also need the equivalent of index.return from the standard "sort" function:
> 
>   f(c(10,1,10,100,1,10))
> 
>   =>
> 
>   grouped: c(10,10,10,1,1,100)
>   ix:	  c(1,3,6,2,5,4)
> 
> 
> is there a way to achieve this which would be faster than the standard sort function?
> 
> Thanks for any hints,

Here's one way:

v <- c(10,1,10,100,1,10)
ix <- do.call("c",split(seq(along=v),v))
grouped <- v[ix]

Not sure about the speed though. Should be O(N) if the number of
groups is small, but the multiplier could be large because of various
formalities (such as adding names to ix).


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907