Skip to content

Ordering long vectors

9 messages · Thomas Lumley, Brian Ripley, Ramzi Feghali +2 more

#
I need to order a long vector of integers with rather few unique values.
This is very slow:
[1] 189.18   0.09 190.48   0.00   0.00

But with no ties
[1] 1.18 0.00 1.18 0.00 0.00

it is very fast!
This gave me the following idea: Since I don't care about keeping the 
order within tied values, why not add some small disturbance to  x,
and indeed,
[1] 1.66 0.00 1.66 0.00 0.00
[1] TRUE

it works! 

Is there an obvious (=better) solution to this problem that I have 
overlooked? In any case, I think that the problem with order and many 
ties is worth mentioning in the help page. 

For the record: R-1.7.0, RH9

G?ran
---
 G?ran Brostr?m                    tel: +46 90 786 5223
 Department of Statistics          fax: +46 90 786 6614
 Ume? University                   http://www.stat.umu.se/egna/gb/
 SE-90187 Ume?, Sweden             e-mail: gb at stat.umu.se
#
On Sat, 7 Jun 2003, G?ran Brostr?m wrote:

            
An even better way is
[1] 1.32 0.05 1.37 0.00 0.00

Faster, but more important; it keeps the original ordering for tied 
values. Thanks to James Holtman.

G?ran
[...]
#
On Sun, 8 Jun 2003, [ISO-8859-1] Göran Broström wrote:

            
Another option:
This turns out to be slightly slower than your method, but doesn't require
that you know what the smallest difference between values is (and works
for characters as well as numbers)

	-thomas
#
On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote:

            
I think the culprit is

src/main/sort.c: orderVector1

    /* Shell sort isn't stable, but it proves to be somewhat faster
       to run a final insertion sort to re-order runs of ties when
       comparison is cheap.
    */

This also explains:
[1] 3.67 0.01 3.68 0.00 0.00
^C
Timing stopped at: 49.33 0.01 49.34 0 0

which is perhaps the simplest work-around :).


	-thomas
#
On Sun, 8 Jun 2003, Thomas Lumley wrote:

            
Thanks. This is really surprising: it is *much* faster to break ties by a 
second condition than not breaking them. I think it should be mentioned 
in the help. And could 'order/sort' be modified to check for 'tieness'? 
But I guess the the overhead would be too heavy.

(if (length(unique(x)) < alpha * length(x)) then .... else ....)

G?ran
#
On Sun, 8 Jun 2003, G?ran Brostr?m wrote:

            
There is a simple and very much faster solution if you don't care about 
ordering of ties:

system.time(ord4 <- sort(x, method="quick", index.return = TRUE)$ix)

That is in the help, and I am very surprised you failed to find it.
#
Ramzi Feghali wrote:

            
Whom is this mail adressed to? The mailing list R-help or Professor Ripley?
So you are going to optimize the following code?
There seems to be a logical n x m matrix called bin.
And the sollution of your problem should be solved within seconds by

  bin %*% t(bin)

where the upper triangular matrix of the result consists of the results 
for all i and k in your loops.

Uwe Ligges
#
On Mon, 9 Jun 2003, Prof Brian Ripley wrote:

            
Thanks, it is indeed much faster; about 7 times on my example.
Well, I found it, but I was cooled off by the "somewhat faster than 
Shellsort" and "poor performance in the worst case", together with the 
fact that I only needed the order and not the sorted vector. Maybe a more 
positive description is in order :-). Especially on the help page of 'order'.

G.