summarizing a data frame i.e. count -> group by
Hello, This is one problem at the time :) I have a data frame df that looks like this: time partitioning_mode workload runtime 1 1 sharding query 607 2 1 sharding query 85 3 1 sharding query 52 4 1 sharding query 79 5 1 sharding query 77 6 1 sharding query 67 7 1 sharding query 98 8 1 sharding refresh 2932 9 1 sharding refresh 2870 10 1 sharding refresh 2877 11 1 sharding refresh 2868 12 1 replication query 2891 13 1 replication query 2907 14 1 replication query 2922 15 1 replication query 2937 and if I could use SQL ... omg! I really wish I could! I would do exactly this: insert into throughput select time, partitioning_mode, count(*) from data.frame group by time, partitioning_mode My attempted R versions are wrong and produce very cryptic error message:
throughput <- aggregate(x=df[,c("time", "partitioning_mode")], by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : incorrect number of dimensions
throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : incorrect number of dimensions
throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count)
I cant comprehend what comes out from this one ... :( and I thought C++ template errors were the most cryptic ;P Many many thanks in advance, Best regards, Giovanni