Skip to content
Back to formatted view

Raw Message

Message-ID: <67A585E8-37C5-4B58-85AC-D2455EDDADCD@gmail.com>
Date: 2011-08-29T17:56:19Z
From: Peter Dalgaard
Subject: Function rank() for data frames (or multiple vectors)?
In-Reply-To: <4E5B967B.8070702@charite.de>

On Aug 29, 2011, at 15:39 , Sebastian Bauer wrote:

>> 
>> > rr <- data.frame(a = c(1,1,1,1,2), b=c(1,2,2,3,1))
>> 
>> > ave(order(rr$a, rr$b), rr$a, rr$b )
>> [1] 1.0 2.5 2.5 4.0 5.0
> 
> Actually, this may be a solution I was looking for! Note that it assumes that rr to be sorted already (hence the first argument of ave could be simply 1:nrow(rr)). Also, by using FUN=min or FUN=max I can cover the other cases. Thanks for this!
> 

Yes, order() and rank() are different beasts so you'd need the presort.

You might consider this:

> rr <- data.frame(a = c(1,1,1,2,2), b=c(2,2,1,3,1))
> rr
  a b
1 1 2
2 1 2
3 1 1
4 2 3
5 2 1

> ave(order(rr$a, rr$b), rr$a, rr$b ) #WORNG!
[1] 2 2 2 5 4
> ave(order(order(rr$a, rr$b)), rr$a, rr$b )
[1] 2.5 2.5 1.0 5.0 4.0

Figuring out why order(order(x)) == rank(x) if you ignore ties is "left as an exercise" (i.e., I can't recall the argument just now...). 


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
"D?den skal tape!" --- Nordahl Grieg