Skip to content
Prev 244212 / 398506 Next

fast subsetting of lists in lists

Hello,

Matthew's hint is interesting:

Am 07.12.2010 19:16, schrieb Matthew Dowle:
My data is mostly regular, that is every sublist contains a data.frame
which is the major contribution to overall size. The reason I use lists
is mainly that I need also some bits of information about the
environment. I thought about putting these into additional columns of
the data.frame (and add redundancy and maybe 30% of overhead this way),
one column per variable. But as memory usage is already close to the
limit of my machine this might break things (the situation is a bit
tricky, isn't it?).
I didn't know that a column of a data.frame can be a list. So if I need
only let's say 10 entries in that list, but my data.frame has several
hundred rows, would the "empty" parts of the "column-list" be filled
with cycled values or would they be really empty and thus not use
additional memory?
Secondly as I mentioned in another email to this topic: a whole day of
data contains about 100 chunks of data that is 100 of the sublists
described above. I could put them all into one large data.frame, but
then I would have to extract the "environmental data" from the long
list, now containing repeated occurrences of variables with the same
name. I guess subsetting could become tricky here (dependend on name and
position, I assume), but I'm eager to learn an easy way of doing so.

Sorry for not submitting an illustrative example, but I'm afraid that
would be quite lengthy and not so illustrative any more.

The data.table mentioned below seems to be an interesting alternative;
I'll definitely look into this. But it would also mean quite a bit of
homework, as far as I can see...

Thanks

Alex