An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120211/1b0c6fc8/attachment.pl>
Counting occurences of variables in a dataframe
6 messages · Kai Mx, Tal Galili, David Winsemius +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120211/5398a32c/attachment.pl>
On Sat, Feb 11, 2012 at 07:17:54PM +0100, Kai Mx wrote:
Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
I would like to add a new variable to the dataframe counting the
occurrences of different values in knames in their order of appearance
(according to the date as in indicated in kdate). The solution should be a
variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
but there must be a more elegant way to this.
Hi.
Is the first 2 in the new variable due to the fact that
the name is "ab" and "ab" at row 5 has older date? If so,
then try the following
ind <- order(kdata$kdate)
f <- function(x) seq.int(along.with=x)
kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)]
knames kdate x
1 ab 2011-10-01 2
2 aa 2011-11-02 2
3 ac 2010-10-01 1
4 ad 2010-03-15 1
5 ab 2010-12-01 1
6 ac 2011-01-05 2
7 aa 2010-10-01 1
8 ad 2011-05-04 2
9 ae 2011-06-03 1
10 af 2011-02-01 1
kdata$knames[ind] orders the names by increasing date.
ave(...)[order(ind)] reorders the output of ave() to the original order.
Hope this helps.
Petr Savicky.
On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:
Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603',
'20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
> ave(unclass(kdate), knames, FUN=order )
[1] 2 2 1 1 1 2 1 2 1 1
That was actually not using the dataframe values but you could also do
this:
> kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
> kdata
knames kdate ord
1 ab 2011-10-01 2
2 aa 2011-11-02 2
3 ac 2010-10-01 1
4 ad 2010-03-15 1
5 ab 2010-12-01 1
6 ac 2011-01-05 2
7 aa 2010-10-01 1
8 ad 2011-05-04 2
9 ae 2011-06-03 1
10 af 2011-02-01 1
I would like to add a new variable to the dataframe counting the occurrences of different values in knames in their order of appearance (according to the date as in indicated in kdate). The solution should be a variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop, but there must be a more elegant way to this. Thanks! Best, Kai [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
On Sat, Feb 11, 2012 at 04:05:25PM -0500, David Winsemius wrote:
On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:
Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603',
'20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
ave(unclass(kdate), knames, FUN=order )
[1] 2 2 1 1 1 2 1 2 1 1 That was actually not using the dataframe values but you could also do this:
kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order )) kdata
knames kdate ord 1 ab 2011-10-01 2 2 aa 2011-11-02 2 3 ac 2010-10-01 1 4 ad 2010-03-15 1 5 ab 2010-12-01 1 6 ac 2011-01-05 2 7 aa 2010-10-01 1 8 ad 2011-05-04 2 9 ae 2011-06-03 1 10 af 2011-02-01 1
Hi.
This is a good solution, if there are at most two occurrences
of each name. If there are more occurrences, then function "order"
should be replaced by "rank". Replacing name "aa" at row 2 by "ab",
we get
knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order))
kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
kdata
knames kdate ord rank
1 ab 2011-10-01 3 2
2 ab 2011-11-02 1 3
3 ac 2010-10-01 1 1
4 ad 2010-03-15 1 1
5 ab 2010-12-01 2 1
6 ac 2011-01-05 2 2
7 aa 2010-10-01 1 1
8 ad 2011-05-04 2 2
9 ae 2011-06-03 1 1
10 af 2011-02-01 1 1
The names "ab" occur in the order row 5, row 1, row 2, so
row 1 should get index 2, row 2 index 3.
If some of the dates repeat, then rank() by default computes
the average index. In this case, the following function f()
may be used
knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
f <- function(x) rank(x, ties.method="first")
kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f))
kdata
knames kdate rank f
1 ab 2011-10-01 2.5 2
2 ab 2011-10-01 2.5 3
3 ac 2010-10-01 1.0 1
4 ad 2010-03-15 1.0 1
5 ab 2010-12-01 1.0 1
6 ac 2011-01-05 2.0 2
7 aa 2010-10-01 1.0 1
8 ad 2011-05-04 2.0 2
9 ae 2011-06-03 1.0 1
10 af 2011-02-01 1.0 1
Hope this helps.
Petr Savicky.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120212/c57d5539/attachment.pl>