Skip to content

how to make aggregation in R ?

5 messages · Ferry, Gabor Grothendieck, jim holtman +1 more

#
Here are two solutions:
v1 v2 n1 n2
1   a a1  6 66
2   a a2  4 24
3   a a3  5 25
4   b b1 13 53
5   b b2 27 87
6   c c1 11 31
7   c c2 39 99
8   c c3 15 35
9   d d1 16 36
10  d d2 17 37
11  d d3 18 38
12  d d4 39 79
v1 v2 sum(n1) sum(n2)
1   a a1       6      66
2   a a2       4      24
3   a a3       5      25
4   b b1      13      53
5   b b2      27      87
6   c c1      11      31
7   c c2      39      99
8   c c3      15      35
9   d d1      16      36
10  d d2      17      37
11  d d3      18      38
12  d d4      39      79
On Thu, Mar 19, 2009 at 9:18 PM, Ferry <fmi.mlist at gmail.com> wrote:
#
Try this technique.  I use it with large data objects since it is
sometime faster, and uses less memory, by using indices:

x <- read.table(textConnection("  v1 v2 n1 n2
1   a a1  1 21
2   a a1  2 22
3   a a1  3 23
4   a a2  4 24
5   a a3  5 25
6   b b1  6 26
7   b b1  7 27
8   b b2  8 28
9   b b2  9 29
10  b b2 10 30
11  c c1 11 31
12  c c2 12 32
13  c c2 13 33
14  c c2 14 34
15  c c3 15 35
16  d d1 16 36
17  d d2 17 37
18  d d3 18 38
19  d d4 19 39
20  d d4 20 40"), header=TRUE)
closeAllConnections()
# use indices to reduce memory
x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE)
# now aggregate using the indices
x.agg <- do.call(rbind, lapply(x.ind, function(.seg){
    data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]],
        n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg]))
}))
On Thu, Mar 19, 2009 at 9:09 PM, Ferry <fmi.mlist at gmail.com> wrote:

  
    
#
On Thu, Mar 19, 2009 at 8:40 PM, jim holtman <jholtman at gmail.com> wrote:
This is basically the approach that the plyr package,
http://had.co.nz/plyr, uses behind a user-friendly interface.

Hadley