Message-ID: <Pine.LNX.4.44.0306081830190.7491-100000@tal.stat.umu.se>
Date: 2003-06-08T16:44:52Z
From: Göran Broström
Subject: Ordering long vectors
In-Reply-To: <Pine.A41.4.44.0306080905570.61872-100000@homer03.u.washington.edu>
On Sun, 8 Jun 2003, Thomas Lumley wrote:
> On Sat, 7 Jun 2003, [ISO-8859-1] G?ran Brostr?m wrote:
>
> >
> > I need to order a long vector of integers with rather few unique values.
> > This is very slow:
>
>
> I think the culprit is
>
> src/main/sort.c: orderVector1
>
> /* Shell sort isn't stable, but it proves to be somewhat faster
> to run a final insertion sort to re-order runs of ties when
> comparison is cheap.
> */
>
> This also explains:
>
> > aa<-sample(rep(1:10,50000))
> > system.time( order(aa, 1:length(aa)))
> [1] 3.67 0.01 3.68 0.00 0.00
> > system.time( order(aa))
> ^C
> Timing stopped at: 49.33 0.01 49.34 0 0
>
> which is perhaps the simplest work-around :).
Thanks. This is really surprising: it is *much* faster to break ties by a
second condition than not breaking them. I think it should be mentioned
in the help. And could 'order/sort' be modified to check for 'tieness'?
But I guess the the overhead would be too heavy.
(if (length(unique(x)) < alpha * length(x)) then .... else ....)
G?ran