unfold list (variable number of columns) into a data frame
Hi:
Here's one approach:
# Function to process a list component into a data frame
ff <- function(x) {
data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
runtime = as.numeric(x[4:length(x)]) )
}
# Apply it to each element of the list:
do.call(rbind, lapply(data, ff))
or equivalently, using the plyr package,
library('plyr')
ldply(data, ff)
# Example:
L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
"67", "98"),
c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
c("1", "replication", "query", "2891", "2907", "2922", "2937"))
do.call(rbind, lapply(L, ff))
time partitioning_mode workload runtime
1 1 sharding query 607
2 1 sharding query 85
3 1 sharding query 52
4 1 sharding query 79
5 1 sharding query 77
6 1 sharding query 67
7 1 sharding query 98
8 1 sharding refresh 2932
9 1 sharding refresh 2870
10 1 sharding refresh 2877
11 1 sharding refresh 2868
12 1 replication query 2891
13 1 replication query 2907
14 1 replication query 2922
15 1 replication query 2937
HTH,
Dennis
On Sun, Oct 23, 2011 at 8:38 AM, Giovanni Azua <bravegag at gmail.com> wrote:
Hello, I used R a lot one year ago and now I am a bit rusty :) I have my raw data which correspond to the list of runtimes per minute (minute "1" "2" "3" in two database modes "sharding" and "query" and two workload types "query" and "refresh") and as a list of char arrays that looks like this:
str(data)
List of 122 ?$ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ?... ?$ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ... ?$ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ... ?$ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ... ?$ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ?... ?$ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ... ?$ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ... ?$ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ... ?$ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ?... ?$ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ... ?$ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ... ?$ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ... I would like to transform the one above into a data frame where this structure in unfolded in the following way: 'data.frame': N obs. of ?3 variables: ?$ time : int ?1 1 1 1 1 1 1 1 1 1 1 ... ?$ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ... ?$ workload : chr "query" "query" "query" "query" "query" "query" "query" "refresh" "refresh" "refresh" "refresh" ... ?$ runtime : num ?607 85 52 79 77 67 98 2932 2870 2877 2868... So instead of having an associative array (variable number of columns) it should become a simple list where the group or factors are repeated for every occurrence of the ?specific runtime. Basically my ultimate goal is to get a data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" i.e. using this data frame format. Help greatly appreciated! TIA, Best regards, Giovanni
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.