Hi,
I have a data.frame which is ordered by score, and has a factor column:
Browse[1]> wc[c("report","score")]
report score
9 ADEA 0.96
8 ADEA 0.90
11 Asylum_FED9 0.86
3 ADEA 0.75
14 Asylum_FED9 0.60
5 ADEA 0.56
13 Asylum_FED9 0.51
16 Asylum_FED9 0.51
2 ADEA 0.42
7 ADEA 0.31
17 Asylum_FED9 0.27
1 ADEA 0.17
4 ADEA 0.17
6 ADEA 0.12
10 ADEA 0.11
12 Asylum_FED9 0.10
15 Asylum_FED9 0.09
18 Asylum_FED9 0.07
Browse[1]>
I need to add a column indicating rank within each factor group, which I
currently accomplish like so:
wc$rank <- 0
for(report in as.character(unique(wc$report))) {
wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
}
I have to wonder whether there's a better way, something that gets rid of
the for() loop using tapply() or by() or similar. But I haven't come up
with anything.
I've tried these:
by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <-
1:nrow(pr)})
But in both cases the effect of the assignment is lost, there's no $rank
column generated for wc.
Any suggestions?
-Ken
Compute rank within factor groups
4 messages · Ken Williams, Greg Snow, jim holtman +1 more
Look at ?ave and try something like:
wc$rank <- ave( wc$score, wc$report, FUN=rank )
This works even if the dataframe is not pre sorted. Hope this helps,
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ken Williams
> Sent: Thursday, July 12, 2007 12:09 PM
> To: R-help at stat.math.ethz.ch
> Subject: [R] Compute rank within factor groups
>
> Hi,
>
> I have a data.frame which is ordered by score, and has a
> factor column:
>
> Browse[1]> wc[c("report","score")]
> report score
> 9 ADEA 0.96
> 8 ADEA 0.90
> 11 Asylum_FED9 0.86
> 3 ADEA 0.75
> 14 Asylum_FED9 0.60
> 5 ADEA 0.56
> 13 Asylum_FED9 0.51
> 16 Asylum_FED9 0.51
> 2 ADEA 0.42
> 7 ADEA 0.31
> 17 Asylum_FED9 0.27
> 1 ADEA 0.17
> 4 ADEA 0.17
> 6 ADEA 0.12
> 10 ADEA 0.11
> 12 Asylum_FED9 0.10
> 15 Asylum_FED9 0.09
> 18 Asylum_FED9 0.07
> Browse[1]>
>
> I need to add a column indicating rank within each factor
> group, which I currently accomplish like so:
>
> wc$rank <- 0
> for(report in as.character(unique(wc$report))) {
> wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
> }
>
> I have to wonder whether there's a better way, something that
> gets rid of the for() loop using tapply() or by() or similar.
> But I haven't come up with anything.
>
> I've tried these:
>
> by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
>
> by(wc, wc$report, FUN=function(pr){wc[wc$report %in%
> pr$report,]$rank <-
> 1:nrow(pr)})
>
> But in both cases the effect of the assignment is lost,
> there's no $rank column generated for wc.
>
> Any suggestions?
>
> -Ken
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Is this what you are looking for:
x
report score 9 ADEA 0.96 8 ADEA 0.90 11 Asylum_FED9 0.86 3 ADEA 0.75 14 Asylum_FED9 0.60 5 ADEA 0.56 13 Asylum_FED9 0.51 16 Asylum_FED9 0.51 2 ADEA 0.42 7 ADEA 0.31 17 Asylum_FED9 0.27 1 ADEA 0.17 4 ADEA 0.17 6 ADEA 0.12 10 ADEA 0.11 12 Asylum_FED9 0.10 15 Asylum_FED9 0.09 18 Asylum_FED9 0.07
x$rank <- ave(x$score, x$report, FUN=rank) x
report score rank 9 ADEA 0.96 10.0 8 ADEA 0.90 9.0 11 Asylum_FED9 0.86 8.0 3 ADEA 0.75 8.0 14 Asylum_FED9 0.60 7.0 5 ADEA 0.56 7.0 13 Asylum_FED9 0.51 5.5 16 Asylum_FED9 0.51 5.5 2 ADEA 0.42 6.0 7 ADEA 0.31 5.0 17 Asylum_FED9 0.27 4.0 1 ADEA 0.17 3.5 4 ADEA 0.17 3.5 6 ADEA 0.12 2.0 10 ADEA 0.11 1.0 12 Asylum_FED9 0.10 3.0 15 Asylum_FED9 0.09 2.0 18 Asylum_FED9 0.07 1.0
On 7/12/07, Ken Williams <ken.williams at thomson.com> wrote:
Hi,
I have a data.frame which is ordered by score, and has a factor column:
Browse[1]> wc[c("report","score")]
report score
9 ADEA 0.96
8 ADEA 0.90
11 Asylum_FED9 0.86
3 ADEA 0.75
14 Asylum_FED9 0.60
5 ADEA 0.56
13 Asylum_FED9 0.51
16 Asylum_FED9 0.51
2 ADEA 0.42
7 ADEA 0.31
17 Asylum_FED9 0.27
1 ADEA 0.17
4 ADEA 0.17
6 ADEA 0.12
10 ADEA 0.11
12 Asylum_FED9 0.10
15 Asylum_FED9 0.09
18 Asylum_FED9 0.07
Browse[1]>
I need to add a column indicating rank within each factor group, which I
currently accomplish like so:
wc$rank <- 0
for(report in as.character(unique(wc$report))) {
wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
}
I have to wonder whether there's a better way, something that gets rid of
the for() loop using tapply() or by() or similar. But I haven't come up
with anything.
I've tried these:
by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <-
1:nrow(pr)})
But in both cases the effect of the assignment is lost, there's no $rank
column generated for wc.
Any suggestions?
-Ken
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Ken Williams wrote:
Hi,
I have a data.frame which is ordered by score, and has a factor column:
Browse[1]> wc[c("report","score")]
report score
9 ADEA 0.96
8 ADEA 0.90
11 Asylum_FED9 0.86
3 ADEA 0.75
14 Asylum_FED9 0.60
5 ADEA 0.56
13 Asylum_FED9 0.51
16 Asylum_FED9 0.51
2 ADEA 0.42
7 ADEA 0.31
17 Asylum_FED9 0.27
1 ADEA 0.17
4 ADEA 0.17
6 ADEA 0.12
10 ADEA 0.11
12 Asylum_FED9 0.10
15 Asylum_FED9 0.09
18 Asylum_FED9 0.07
Browse[1]>
I need to add a column indicating rank within each factor group, which I
currently accomplish like so:
wc$rank <- 0
for(report in as.character(unique(wc$report))) {
wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
}
I have to wonder whether there's a better way, something that gets rid of
the for() loop using tapply() or by() or similar. But I haven't come up
with anything.
I've tried these:
by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <-
1:nrow(pr)})
But in both cases the effect of the assignment is lost, there's no $rank
column generated for wc.
Any suggestions?
There's a little known and somewhat unfortunately named function called ave() which does just that sort of thing. > ave(wc$score, wc$report, FUN=rank) [1] 10.0 9.0 8.0 8.0 7.0 7.0 5.5 5.5 6.0 5.0 4.0 3.5 3.5 2.0 1.0 [16] 3.0 2.0 1.0