Skip to content

aggregate help

3 messages · Sam Steingold, arun

#
I want to count attributes of IDs:
--8<---------------cut here---------------start------------->8---
z <- data.frame(id=c(10,20,10,30,10,20),
                a1=c("a","b","a","c","b","b"),
                a2=c("x","y","x","z","z","y"),
                stringsAsFactors=FALSE)
id a1 a2
1 10  a  x
2 20  b  y
3 10  a  x
4 30  c  z
5 10  b  z
6 20  b  y
--8<---------------cut here---------------end--------------->8---
I want to get something like
--8<---------------cut here---------------start------------->8---
id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
10   3     "a"      2      "b"      1       3      "x"     2       "z"     1
20   2     "b"      2      <NA>     0       2      "y"     2       <NA>    0
30   1     "c"      1      <NA>     0       1      "z"     1       <NA>    0
--8<---------------cut here---------------end--------------->8---
(except that I don't care what appears in the cells marked with <NA>)
I tried this:
--8<---------------cut here---------------start------------->8---
aggregate(z,by=list(id=z$id),function (s) {
  t <- sort(table(s),decreasing=TRUE)
  if (length(t) == 1)
    list(length(s),names(t)[1],t[1],"junk",0)
  else
    list(length(s),names(t)[1],t[1],names(t)[2],t[2])
 })
  id id a1 a2
1 10  3  3  3
2 20  2  2  2
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
--8<---------------cut here---------------end--------------->8---
Thanks!
#
Hi,
Try this:

z1<-aggregate(z,list(id=z$id),FUN=paste,sep=",")
dat1<-data.frame(id=z1[,1],a1.total=unlist(lapply(z1[,3],length)),a1.val1=unique(z$a1),a1.num=unlist(lapply(lapply(z1[,3],table),`[`,1)),a1.val2=unlist(lapply(z1[,3],`[`,3)),a1.num2=unlist(lapply(lapply(z1[,3],table),`[`,2)),a2.total=unlist(lapply(z1[,4],length)),a2.val1=unique(z$a2),a2.num=unlist(lapply(lapply(z1[,4],table),`[`,1)),a2.val2=unlist(lapply(z1[,4],`[`,3)),a2.num2=unlist(lapply(lapply(z1[,4],table),`[`,2)))
dat1

# id a1.total a1.val1 a1.num a1.val2 a1.num2 a2.total a2.val1 a2.num a2.val2
#0 10??????? 3?????? a????? 2?????? b?????? 1??????? 3?????? x????? 2?????? z
#1 20??????? 2?????? b????? 2??? <NA>????? NA??????? 2?????? y????? 2??? <NA>
#2 30??????? 1?????? c????? 1??? <NA>????? NA??????? 1?????? z????? 1??? <NA>
#? a2.num2
#0?????? 1
#1????? NA
#2????? NA
#It is not an elegant way!


A.K.



----- Original Message -----
From: Sam Steingold <sds at gnu.org>
To: r-help at r-project.org
Cc: 
Sent: Thursday, September 20, 2012 2:06 PM
Subject: [R] aggregate help

I want to count attributes of IDs:
--8<---------------cut here---------------start------------->8---
z <- data.frame(id=c(10,20,10,30,10,20),
? ? ? ? ? ? ? ? a1=c("a","b","a","c","b","b"),
? ? ? ? ? ? ? ? a2=c("x","y","x","z","z","y"),
? ? ? ? ? ? ? ? stringsAsFactors=FALSE)
? id a1 a2
1 10? a? x
2 20? b? y
3 10? a? x
4 30? c? z
5 10? b? z
6 20? b? y
--8<---------------cut here---------------end--------------->8---
I want to get something like
--8<---------------cut here---------------start------------->8---
id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
10?  3? ?  "a"? ? ? 2? ? ? "b"? ? ? 1? ? ?  3? ? ? "x"? ?  2? ? ?  "z"? ?  1
20?  2? ?  "b"? ? ? 2? ? ? <NA>? ?  0? ? ?  2? ? ? "y"? ?  2? ? ?  <NA>? ? 0
30?  1? ?  "c"? ? ? 1? ? ? <NA>? ?  0? ? ?  1? ? ? "z"? ?  1? ? ?  <NA>? ? 0
--8<---------------cut here---------------end--------------->8---
(except that I don't care what appears in the cells marked with <NA>)
I tried this:
--8<---------------cut here---------------start------------->8---
aggregate(z,by=list(id=z$id),function (s) {
? t <- sort(table(s),decreasing=TRUE)
? if (length(t) == 1)
? ? list(length(s),names(t)[1],t[1],"junk",0)
? else
? ? list(length(s),names(t)[1],t[1],names(t)[2],t[2])
})
? id id a1 a2
1 10? 3? 3? 3
2 20? 2? 2? 2
3 30? 1? 1? 1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
? corrupt data frame: columns will be truncated or padded with NAs
--8<---------------cut here---------------end--------------->8---
Thanks!
2 days later
#
Thanks.
Why does
id         id      a1      a2
1 10 10, 10, 10 a, a, b x, x, z
2 20     20, 20    b, b    y, y
3 30         30       c       z

work, but

aggregate(z, list(id=z$id),FUN=function(l) {
  t <- sort(table(l),decreasing=TRUE)
  list(length(t),t[1],names(t)[1],t[2],names(t)[2])
  })
   id id a1 a2
1 10  1  2  2
2 20  1  1  1
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
  
does not?
(I do not want to put the whole list of all possible values into the
return value of aggregate because I am afraid of running out of ram)