Skip to content

Sparse dataframes?

3 messages · Andrew Hoerner, Karl Ove Hufthammer

#
Dear Folks--
Is there a data frame analog to sparse matrices? I am working with a panel
data set that has a large number of variables that are redefined repeatedly
or exist for only a few years (out of 48).  In my current structure, where
variables are columns and rows are years, more than 90 percent of the cells
and more than 3/4 of the total size of my file are NAs.  

I am wondering if there is an alternate file specification currently
available that still allows numeric, character and factor data to be stored. 
Besides just using a database. 

A pointer in the right direction (or a solid "no" if that is the truth)
would be greatly appreciated.

Sincerely, andrewH



--
View this message in context: http://r.789695.n4.nabble.com/Sparse-dataframes-tp4655614.html
Sent from the R help mailing list archive at Nabble.com.
#
andrewH skreiv:
How about storing the data in a ?long? format, like you get when you
apply melt() (with na.rm=TRUE) from the ?reshape2? package to your data 
frame? Parts of the data frame (the ID part) will be repeated on each row, 
which may make the data take up more space, but no rows are stored for NA 
cells, so for somewhat sparse data it will be a win. It also makes it very 
easy to reshape and analyse the data.

Here?s an introduction (to the older ?reshape? package, but ?reshape2? is 
very similar): http://www.jstatsoft.org/v21/i12

You might also be interested in this paper on ?tidy? data:
http://vita.had.co.nz/papers/tidy-data.pdf
12 days later