Skip to content
Prev 244198 / 398506 Next

fast subsetting of lists in lists

Hello Alex,

Assuming it was just an inadequate example (since a data.frame would suffice 
in that case), did you know that a data.frames' columns do not have to be 
vectors but can be lists?  I don't know if that helps.
a             b
1 1      3.141593
2 2          2, 3
3 3 a, b, c, d, e
[[1]]
[1] 3.141593

[[2]]
[1] 2 3

[[3]]
[1] "a" "b" "c" "d" "e"
a         b
"integer"    "list"
That is still regular though in the sense that each row has a value for all 
the columns, even if that value is NA, or NULL in lists.

If your data is not regular then one option is to flatten it into 
(row,column,value) tuple similar to how sparse matrices are stored.  Your 
value column may be list rather than vector.

Then (and yes you guessed this was coming) ... you can use data.table to 
query the flat structure quickly by setting a key on the first two columns, 
or maybe just the 2nd column when you need to pick out the values for one 
'column' quickly for all 'rows'.

There was a thread about using list() columns in data.table here :

http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
So in data.table if you wanted all the 'b' values,  you might do something 
like this :

setkey(DT,column)
DT[J("b"), value]

which should return the list() quickly from the irregular data.

Matthew


"Alexander Senger" <senger at physik.hu-berlin.de> wrote in message 
news:4CFE6AEE.6030204 at physik.hu-berlin.de...