Skip to content
Prev 275282 / 398503 Next

summarizing a data frame i.e. count -> group by

And the plyr version of this would be (using DF as the data frame name)

## transform method, mapping length(runtime) to all observations
## similar to David's results:
library('plyr')
ddply(DF, .(time, partitioning_mode), transform, n = length(runtime))
# or equivalently, the newer and somewhat faster
ddply(DF, .(time, partitioning_mode), mutate, n = length(runtime))

# If you just want the counts, then use

ddply(DF, .(time, partitioning_mode), summarise, n = length(runtime))

##---------
# Just for fun, here's the equivalent SQL call using sqldf():

library('sqldf')
sqldf('select time partitioning_mode count(*) from DF group by time
partitioning_mode')

# which you can distribute over multiple lines for readability, e.g.

sqldf('select time, partitioning_mode, count(*) as n
      from DF
      group by time, partitioning_mode')

# Result:
  time partitioning_mode  n
1    1       replication  4
2    1          sharding 11

##---------
# To do the same type of summary in data.table (to follow up on Jim
Holtman's post), here's one way:

library(data.table)
dt <- data.table(DF, key = 'time, partitioning_mode')
dt[, list(n = length(runtime)), by = key(dt)]
     time partitioning_mode  n
[1,]    1       replication  4
[2,]    1          sharding 11


###------
HTH,
Dennis
On Sun, Oct 23, 2011 at 10:29 AM, Giovanni Azua <bravegag at gmail.com> wrote: