Hello, I'd like to rank rows of a data frame similar to what rank() does for vectors. However, ties should be broken by columns that I specify. If it is not possible to break a ties (because the row data is essentially the same), I'd like to have the same flexibility that rank() offers. Is there an elegant solution to this simple problem in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken. Thanks for your help! Best, Sebastian
Function rank() for data frames (or multiple vectors)?
10 messages · David Winsemius, Sebastian Bauer, Peter Dalgaard
On Aug 24, 2011, at 11:09 AM, Sebastian Bauer wrote:
Hello, I'd like to rank rows of a data frame similar to what rank() does for vectors. However, ties should be broken by columns that I specify. If it is not possible to break a ties (because the row data is essentially the same), I'd like to have the same flexibility that rank() offers. Is there an elegant solution to this simple problem in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken.
An example of this "simple problem" would clarify this greatly. I cannot tell what "flexibility" in 'rank' is missing in 'order'.
David Winsemius, MD West Hartford, CT
Hi!
I'd like to rank rows of a data frame similar to what rank() does for vectors. However, ties should be broken by columns that I specify. If it is not possible to break a ties (because the row data is essentially the same), I'd like to have the same flexibility that rank() offers. Is there an elegant solution to this simple problem in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken.
An example of this "simple problem" would clarify this greatly. I cannot tell what "flexibility" in 'rank' is missing in 'order'.
Thanks for your answer. For instance, if I have two vectors such as 1 1 1 2 1 2 1 3 2 1 that I want combinedly ranked. I'd like to get an output 1 2 2 4 5 or (ties.method=average) 1 2.5 2.5 4 5 Basically, I need a function similar to the rank() function that accepts more than one vector (as order() does). Best, Sebastian
On Aug 24, 2011, at 1:11 PM, Sebastian Bauer wrote:
Hi!
I'd like to rank rows of a data frame similar to what rank() does for vectors. However, ties should be broken by columns that I specify. If it is not possible to break a ties (because the row data is essentially the same), I'd like to have the same flexibility that rank() offers. Is there an elegant solution to this simple problem in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken.
An example of this "simple problem" would clarify this greatly. I cannot tell what "flexibility" in 'rank' is missing in 'order'.
Thanks for your answer. For instance, if I have two vectors such as 1 1 1 2 1 2 1 3 2 1 that I want combinedly ranked. I'd like to get an output 1 2 2 4 5 or (ties.method=average) 1 2.5 2.5 4 5 Basically, I need a function similar to the rank() function that accepts more than one vector (as order() does).
Can't you just paste the columns and run rank on the results? 'rank' accepts character vectors.
Best, Sebastian
David Winsemius, MD West Hartford, CT
Hi!
in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken.
An example of this "simple problem" would clarify this greatly. I cannot tell what "flexibility" in 'rank' is missing in 'order'.
Thanks for your answer. For instance, if I have two vectors such as 1 1 1 2 1 2 1 3 2 1 that I want combinedly ranked. I'd like to get an output 1 2 2 4 5 or (ties.method=average) 1 2.5 2.5 4 5 Basically, I need a function similar to the rank() function that accepts more than one vector (as order() does).
Can't you just paste the columns and run rank on the results? 'rank' accepts character vectors.
I was looking for an elegant solution ;) In the real case I have double values and this would be quite inefficient then. Best, Sebastian
On Aug 24, 2011, at 1:37 PM, Sebastian Bauer wrote:
Hi!
in R? Basically, what I need is a mixture of order() and rank(). While the former allows to specify multiple vectors, it doesn't provide the flexibility of rank() such that I can specify what happens if ties can not be broken.
An example of this "simple problem" would clarify this greatly. I cannot tell what "flexibility" in 'rank' is missing in 'order'.
Thanks for your answer. For instance, if I have two vectors such as 1 1 1 2 1 2 1 3 2 1 that I want combinedly ranked. I'd like to get an output 1 2 2 4 5 or (ties.method=average) 1 2.5 2.5 4 5 Basically, I need a function similar to the rank() function that accepts more than one vector (as order() does).
Can't you just paste the columns and run rank on the results? 'rank' accepts character vectors.
I was looking for an elegant solution ;) In the real case I have double values and this would be quite inefficient then.
Still no r-code: Then what about rank(order(...) , further-ties.method-argument) ? I'm perhaps not seeing the problem clearly?
Best, Sebastian
David Winsemius, MD West Hartford, CT
Hi!
On 08/24/2011 07:46 PM, David Winsemius wrote:
I was looking for an elegant solution ;) In the real case I have double values and this would be quite inefficient then.
Still no r-code: Then what about rank(order(...) , further-ties.method-argument) ?
I think that, as order() always gives a different value for each element, rank(order()) would return the same result as order() alone. Bye, Sebastian
On Aug 25, 2011, at 7:56 AM, Sebastian Bauer wrote:
Hi! On 08/24/2011 07:46 PM, David Winsemius wrote:
I was looking for an elegant solution ;) In the real case I have double values and this would be quite inefficient then.
Still no r-code: Then what about rank(order(...) , further-ties.method-argument) ?
I think that, as order() always gives a different value for each element, rank(order()) would return the same result as order() alone.
Quite right. I didn't test it since there was no example provided. Do you not understand what is meant by a reproducible example. Pretty much every solution I come up with leaves me (re-) asking the question: What's wrong with rank(paste(...))? Here's another possibility: > rr <- data.frame(a = c(1,1,1,1,2), b=c(1,2,2,3,1)) > ave(order(rr$a, rr$b), rr$a, rr$b ) [1] 1.0 2.5 2.5 4.0 5.0
Bye, Sebastian
David Winsemius, MD West Hartford, CT
3 days later
Hi!
On 08/24/2011 07:46 PM, David Winsemius wrote:
I was looking for an elegant solution ;) In the real case I have double values and this would be quite inefficient then.
Still no r-code: Then what about rank(order(...) , further-ties.method-argument) ?
I think that, as order() always gives a different value for each element, rank(order()) would return the same result as order() alone.
Quite right. I didn't test it since there was no example provided. Do you not understand what is meant by a reproducible example.
Sorry, I thought I gave an example in my response to your response. Didn't know that you wanted a R example (which I didn't have at that time)
Pretty much every solution I come up with leaves me (re-) asking the question: What's wrong with rank(paste(...))?
As said, this is rather inefficient and moreover doesn't work for floats, for which the lexical order of the string representation doesn't match the natural order (e.g., "3e-10" is lexical smaller than "1e-13", while 3e-10 is larger than 1e-13).
Here's another possibility:
> rr <- data.frame(a = c(1,1,1,1,2), b=c(1,2,2,3,1))
> ave(order(rr$a, rr$b), rr$a, rr$b )
[1] 1.0 2.5 2.5 4.0 5.0
Actually, this may be a solution I was looking for! Note that it assumes that rr to be sorted already (hence the first argument of ave could be simply 1:nrow(rr)). Also, by using FUN=min or FUN=max I can cover the other cases. Thanks for this! Bye, Sebastian
On Aug 29, 2011, at 15:39 , Sebastian Bauer wrote:
rr <- data.frame(a = c(1,1,1,1,2), b=c(1,2,2,3,1))
ave(order(rr$a, rr$b), rr$a, rr$b )
[1] 1.0 2.5 2.5 4.0 5.0
Actually, this may be a solution I was looking for! Note that it assumes that rr to be sorted already (hence the first argument of ave could be simply 1:nrow(rr)). Also, by using FUN=min or FUN=max I can cover the other cases. Thanks for this!
Yes, order() and rank() are different beasts so you'd need the presort. You might consider this:
rr <- data.frame(a = c(1,1,1,2,2), b=c(2,2,1,3,1)) rr
a b 1 1 2 2 1 2 3 1 1 4 2 3 5 2 1
ave(order(rr$a, rr$b), rr$a, rr$b ) #WORNG!
[1] 2 2 2 5 4
ave(order(order(rr$a, rr$b)), rr$a, rr$b )
[1] 2.5 2.5 1.0 5.0 4.0 Figuring out why order(order(x)) == rank(x) if you ignore ties is "left as an exercise" (i.e., I can't recall the argument just now...).
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg