fast subsetting of lists in lists
Hello Alex, Assuming it was just an inadequate example (since a data.frame would suffice in that case), did you know that a data.frames' columns do not have to be vectors but can be lists? I don't know if that helps.
DF = data.frame(a=1:3) DF$b = list(pi, 2:3, letters[1:5]) DF
a b 1 1 3.141593 2 2 2, 3 3 3 a, b, c, d, e
DF$b
[[1]] [1] 3.141593 [[2]] [1] 2 3 [[3]] [1] "a" "b" "c" "d" "e"
sapply(DF,class)
a b "integer" "list"
That is still regular though in the sense that each row has a value for all the columns, even if that value is NA, or NULL in lists. If your data is not regular then one option is to flatten it into (row,column,value) tuple similar to how sparse matrices are stored. Your value column may be list rather than vector. Then (and yes you guessed this was coming) ... you can use data.table to query the flat structure quickly by setting a key on the first two columns, or maybe just the 2nd column when you need to pick out the values for one 'column' quickly for all 'rows'. There was a thread about using list() columns in data.table here : http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
Does someone now a trick to do the same as above with the faster built-in subsetting? Something like: test[<somesubsettingmagic>]
So in data.table if you wanted all the 'b' values, you might do something
like this :
setkey(DT,column)
DT[J("b"), value]
which should return the list() quickly from the irregular data.
Matthew
"Alexander Senger" <senger at physik.hu-berlin.de> wrote in message
news:4CFE6AEE.6030204 at physik.hu-berlin.de...
Hello Gerrit, Gabor, thank you for your suggestion. Unfortunately unlist seems to be rather expensive. A short test with one of my datasets gives 0.01s for an extraction based on my approach and 5.6s for unlist alone. The reason seems to be that unlist relies on lapply internally and does so recursively? Maybe there is still another way to go? Alex Am 07.12.2010 15:59, schrieb Gerrit Eichner:
Hello, Alexander, does utest <- unlist(test) utest[ names( utest) == "a"] come close to what you need? Hth, Gerrit On Tue, 7 Dec 2010, Alexander Senger wrote:
Hello,
my data is contained in nested lists (which seems not necessarily to be
the best approach). What I need is a fast way to get subsets from the
data.
An example:
test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
list(a = 7, b = 8, c = 9))
Now I would like to have all values in the named variables "a", that is
the vector c(1, 4, 7). The best I could come up with is:
val <- sapply(1:3, function (i) {test[[i]]$a})
which is unfortunately not very fast. According to R-inferno this is due
to the fact that apply and its derivates do looping in R rather than
rely on C-subroutines as the common [-operator.
Does someone now a trick to do the same as above with the faster
built-in subsetting? Something like:
test[<somesubsettingmagic>]
Thank you for your advice
Alex
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.