Skip to content

R and Data Storage

3 messages · Rick Bilonick, Tobias Verbeke, Frank E Harrell Jr

#
Where I work a lot of people end up using Excel spreadsheets for storing
data. This has limitations and maybe some less than obvious problems. I'd
like to recommend a uniform way for storing and archiving data collected
in the department. Most of the data could be stored in simple csv type
files but it would be nice to have something that stores more information
about the variables and units. netcdf seems like overkill (and not easy
for casual users). Same for postgres and mysql databases. Could someone
recommend some system for storing relatively small data sets (50-100
variables, <1000 records) that would be reliable, safe, and easy for
people to view and edit their data that works nicely with R and is open
source? Am I asking for the moon?

Rick  B.
#
rab45+ at pitt.edu wrote:

            
Would the StatDataML format meet your needs ?
It is open, XML-based, stores variable
types and works nicely with R (as R wizards designed
StatDataML and the corresponding R package).

See
http://cran.r-project.org/src/contrib/Descriptions/StatDataML.html
or
http://www.omegahat.org/StatDataML/

HTH,
Tobias
#
rab45+ at pitt.edu wrote:
What I use is the facilities in the Hmisc package, which handles 
variable labels and units of measurement and has functions for importing 
data (saving labels in the appropriate place) and making use of the 
attributes (e.g., combining labels and units with a smaller font for the 
units portion in an axis label).  When such an annotated data frame is 
saved using save(...., compress=TRUE), load()'ing it back will provide 
an annotated data frame, quickly.  The contents( ) function can show the 
attributes, and we use html(contents( )) to put up a web page with 
hyperlinks for value labels (factor variable levels attribute).