Message-ID: <87ehlsk55v.fsf@gnu.org>
Date: 2012-09-23T17:41:48Z
From: Sam Steingold
Subject: aggregate help
In-Reply-To: <1348176277.50987.YahooMailNeo@web142606.mail.bf1.yahoo.com> (arun's message of "Thu, 20 Sep 2012 14:24:37 -0700 (PDT)")
Thanks.
Why does
> aggregate(z, list(id=z$id),FUN=list)
id id a1 a2
1 10 10, 10, 10 a, a, b x, x, z
2 20 20, 20 b, b y, y
3 30 30 c z
work, but
aggregate(z, list(id=z$id),FUN=function(l) {
t <- sort(table(l),decreasing=TRUE)
list(length(t),t[1],names(t)[1],t[2],names(t)[2])
})
id id a1 a2
1 10 1 2 2
2 20 1 1 1
3 30 1 1 1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
does not?
(I do not want to put the whole list of all possible values into the
return value of aggregate because I am afraid of running out of ram)
> * arun <fznegcvax111 at lnubb.pbz> [2012-09-20 14:24:37 -0700]:
>
> Hi,
> Try this:
>
> z1<-aggregate(z,list(id=z$id),FUN=paste,sep=",")
> dat1<-data.frame(id=z1[,1],a1.total=unlist(lapply(z1[,3],length)),a1.val1=unique(z$a1),a1.num=unlist(lapply(lapply(z1[,3],table),`[`,1)),a1.val2=unlist(lapply(z1[,3],`[`,3)),a1.num2=unlist(lapply(lapply(z1[,3],table),`[`,2)),a2.total=unlist(lapply(z1[,4],length)),a2.val1=unique(z$a2),a2.num=unlist(lapply(lapply(z1[,4],table),`[`,1)),a2.val2=unlist(lapply(z1[,4],`[`,3)),a2.num2=unlist(lapply(lapply(z1[,4],table),`[`,2)))
> dat1
>
> # id a1.total a1.val1 a1.num a1.val2 a1.num2 a2.total a2.val1 a2.num a2.val2
> #0 10??????? 3?????? a????? 2?????? b?????? 1??????? 3?????? x????? 2?????? z
> #1 20??????? 2?????? b????? 2??? <NA>????? NA??????? 2?????? y????? 2??? <NA>
> #2 30??????? 1?????? c????? 1??? <NA>????? NA??????? 1?????? z????? 1??? <NA>
> #? a2.num2
> #0?????? 1
> #1????? NA
> #2????? NA
> #It is not an elegant way!
>
>
> A.K.
>
>
>
> ----- Original Message -----
> From: Sam Steingold <sds at gnu.org>
> To: r-help at r-project.org
> Cc:
> Sent: Thursday, September 20, 2012 2:06 PM
> Subject: [R] aggregate help
>
> I want to count attributes of IDs:
> z <- data.frame(id=c(10,20,10,30,10,20),
> ? ? ? ? ? ? ? ? a1=c("a","b","a","c","b","b"),
> ? ? ? ? ? ? ? ? a2=c("x","y","x","z","z","y"),
> ? ? ? ? ? ? ? ? stringsAsFactors=FALSE)
>> z
> ? id a1 a2
> 1 10? a? x
> 2 20? b? y
> 3 10? a? x
> 4 30? c? z
> 5 10? b? z
> 6 20? b? y
> I want to get something like
> id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
> 10? 3? ? "a"? ? ? 2? ? ? "b"? ? ? 1? ? ? 3? ? ? "x"? ? 2? ? ? "z"? ? 1
> 20? 2? ? "b"? ? ? 2? ? ? <NA>? ? 0? ? ? 2? ? ? "y"? ? 2? ? ? <NA>? ? 0
> 30? 1? ? "c"? ? ? 1? ? ? <NA>? ? 0? ? ? 1? ? ? "z"? ? 1? ? ? <NA>? ? 0
> (except that I don't care what appears in the cells marked with <NA>)
> I tried this:
> aggregate(z,by=list(id=z$id),function (s) {
> ? t <- sort(table(s),decreasing=TRUE)
> ? if (length(t) == 1)
> ? ? list(length(s),names(t)[1],t[1],"junk",0)
> ? else
> ? ? list(length(s),names(t)[1],t[1],names(t)[2],t[2])
> })
> ? id id a1 a2
> 1 10? 3? 3? 3
> 2 20? 2? 2? 2
> 3 30? 1? 1? 1
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
> ? corrupt data frame: columns will be truncated or padded with NAs
> Thanks!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://mideasttruth.com http://think-israel.org
http://jihadwatch.org http://palestinefacts.org http://iris.org.il
Bill Gates is great, as long as `bill' is a verb.