Counting occurences of variables in a dataframe

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120211/1b0c6fc8/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120211/5398a32c/attachment.pl>
Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
I would like to add a new variable to the dataframe counting the
occurrences of different values in knames in their order of appearance
(according to the date as in indicated in kdate). The solution should be a
variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
but there must be a more elegant way to this.
Hi.

Is the first 2 in the new variable due to the fact that
the name is "ab" and "ab" at row 5 has older date? If so,
then try the following

  ind <- order(kdata$kdate)
  f <- function(x) seq.int(along.with=x)
  kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)]

     knames      kdate x
  1      ab 2011-10-01 2
  2      aa 2011-11-02 2
  3      ac 2010-10-01 1
  4      ad 2010-03-15 1
  5      ab 2010-12-01 1
  6      ac 2011-01-05 2
  7      aa 2010-10-01 1
  8      ad 2011-05-04 2
  9      ae 2011-06-03 1
  10     af 2011-02-01 1

kdata$knames[ind] orders the names by increasing date.
ave(...)[order(ind)] reorders the output of ave() to the original order.

Hope this helps.

Petr Savicky.

Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603',  
'20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
>  ave(unclass(kdate), knames, FUN=order )
  [1] 2 2 1 1 1 2 1 2 1 1

That was actually not using the dataframe values but you could also do  
this:

 > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
 > kdata
    knames      kdate ord
1      ab 2011-10-01   2
2      aa 2011-11-02   2
3      ac 2010-10-01   1
4      ad 2010-03-15   1
5      ab 2010-12-01   1
6      ac 2011-01-05   2
7      aa 2010-10-01   1
8      ad 2011-05-04   2
9      ae 2011-06-03   1
10     af 2011-02-01   1
I would like to add a new variable to the dataframe counting the
occurrences of different values in knames in their order of appearance
(according to the date as in indicated in kdate). The solution  
should be a
variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a  
loop,
but there must be a more elegant way to this.

Thanks!

Best,

Kai

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:

Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603',  
'20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)

 ave(unclass(kdate), knames, FUN=order )
 [1] 2 2 1 1 1 2 1 2 1 1

That was actually not using the dataframe values but you could also do  
this:

kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
kdata
   knames      kdate ord
1      ab 2011-10-01   2
2      aa 2011-11-02   2
3      ac 2010-10-01   1
4      ad 2010-03-15   1
5      ab 2010-12-01   1
6      ac 2011-01-05   2
7      aa 2010-10-01   1
8      ad 2011-05-04   2
9      ae 2011-06-03   1
10     af 2011-02-01   1
Hi.

This is a good solution, if there are at most two occurrences
of each name. If there are more occurrences, then function "order"
should be replaced by "rank". Replacing name "aa" at row 2 by "ab",
we get

  knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
  kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
  '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
  format="%Y%m%d")
  kdata <- data.frame (knames, kdate)

  kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order))
  kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
  kdata

     knames      kdate ord rank
  1      ab 2011-10-01   3    2
  2      ab 2011-11-02   1    3
  3      ac 2010-10-01   1    1
  4      ad 2010-03-15   1    1
  5      ab 2010-12-01   2    1
  6      ac 2011-01-05   2    2
  7      aa 2010-10-01   1    1
  8      ad 2011-05-04   2    2
  9      ae 2011-06-03   1    1
  10     af 2011-02-01   1    1

The names "ab" occur in the order row 5, row 1, row 2, so
row 1 should get index 2, row 2 index 3.

If some of the dates repeat, then rank() by default computes
the average index. In this case, the following function f()
may be used

  knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
  kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315',
  '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
  format="%Y%m%d")
  kdata <- data.frame (knames, kdate)

  kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
  f <- function(x) rank(x, ties.method="first")
  kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f))
  kdata

     knames      kdate rank f
  1      ab 2011-10-01  2.5 2
  2      ab 2011-10-01  2.5 3
  3      ac 2010-10-01  1.0 1
  4      ad 2010-03-15  1.0 1
  5      ab 2010-12-01  1.0 1
  6      ac 2011-01-05  2.0 2
  7      aa 2010-10-01  1.0 1
  8      ad 2011-05-04  2.0 2
  9      ae 2011-06-03  1.0 1
  10     af 2011-02-01  1.0 1

Hope this helps.

Petr Savicky.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120212/c57d5539/attachment.pl>