summarizing a data frame i.e. count -> group by

Sun, Oct 23, 2011 10:29 AM

Hello,

This is one problem at the time :)

I have a data frame df that looks like this:

  time partitioning_mode workload runtime
1     1          sharding    query     607
2     1          sharding    query      85
3     1          sharding    query      52
4     1          sharding    query      79
5     1          sharding    query      77
6     1          sharding    query      67
7     1          sharding    query      98
8     1          sharding  refresh    2932
9     1          sharding  refresh    2870
10    1          sharding  refresh    2877
11    1          sharding  refresh    2868
12    1       replication    query    2891
13    1       replication    query    2907
14    1       replication    query    2922
15    1       replication    query    2937

and if I could use SQL ... omg! I really wish I could! I would do exactly this:

insert into throughput
  select time, partitioning_mode, count(*)
  from data.frame 
  group by time, partitioning_mode

My attempted R versions are wrong and produce very cryptic error message:

Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

I cant comprehend what comes out from this one ... :(

and I thought C++ template errors were the most cryptic ;P

Many many thanks in advance,
Best regards,
Giovanni

summarizing a data frame i.e. count -> group by

Thread (5 messages)