Counting occurences of variables in a dataframe
On Sat, Feb 11, 2012 at 07:17:54PM +0100, Kai Mx wrote:
Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
I would like to add a new variable to the dataframe counting the
occurrences of different values in knames in their order of appearance
(according to the date as in indicated in kdate). The solution should be a
variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
but there must be a more elegant way to this.
Hi.
Is the first 2 in the new variable due to the fact that
the name is "ab" and "ab" at row 5 has older date? If so,
then try the following
ind <- order(kdata$kdate)
f <- function(x) seq.int(along.with=x)
kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)]
knames kdate x
1 ab 2011-10-01 2
2 aa 2011-11-02 2
3 ac 2010-10-01 1
4 ad 2010-03-15 1
5 ab 2010-12-01 1
6 ac 2011-01-05 2
7 aa 2010-10-01 1
8 ad 2011-05-04 2
9 ae 2011-06-03 1
10 af 2011-02-01 1
kdata$knames[ind] orders the names by increasing date.
ave(...)[order(ind)] reorders the output of ave() to the original order.
Hope this helps.
Petr Savicky.