Dear Folks-- Is there a data frame analog to sparse matrices? I am working with a panel data set that has a large number of variables that are redefined repeatedly or exist for only a few years (out of 48). In my current structure, where variables are columns and rows are years, more than 90 percent of the cells and more than 3/4 of the total size of my file are NAs. I am wondering if there is an alternate file specification currently available that still allows numeric, character and factor data to be stored. Besides just using a database. A pointer in the right direction (or a solid "no" if that is the truth) would be greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Sparse-dataframes-tp4655614.html Sent from the R help mailing list archive at Nabble.com.
Sparse dataframes?
3 messages · Andrew Hoerner, Karl Ove Hufthammer
andrewH skreiv:
Is there a data frame analog to sparse matrices? I am working with a panel data set that has a large number of variables that are redefined repeatedly or exist for only a few years (out of 48). In my current structure, where variables are columns and rows are years, more than 90 percent of the cells and more than 3/4 of the total size of my file are NAs. I am wondering if there is an alternate file specification currently available that still allows numeric, character and factor data to be stored. Besides just using a database.
How about storing the data in a ?long? format, like you get when you apply melt() (with na.rm=TRUE) from the ?reshape2? package to your data frame? Parts of the data frame (the ID part) will be repeated on each row, which may make the data take up more space, but no rows are stored for NA cells, so for somewhat sparse data it will be a win. It also makes it very easy to reshape and analyse the data. Here?s an introduction (to the older ?reshape? package, but ?reshape2? is very similar): http://www.jstatsoft.org/v21/i12 You might also be interested in this paper on ?tidy? data: http://vita.had.co.nz/papers/tidy-data.pdf
Karl Ove Hufthammer E-mail: karl at huftis.org Jabber: huftis at jabber.no
12 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130127/3f67f2cc/attachment.pl>