Variable lables (was Re: [R] Reading SAS version 8 data into
On 24-Aug-2001 Prof Brian D Ripley wrote:
On Fri, 24 Aug 2001 pauljohn at ukans.edu wrote:
I will try this method to export a sas file, but reading it made me wonder about "variable lables" and "value lables" in R. In SAS and SPSS, the lables are a huge chunk of code and people want to hang onto them. In case you have not used data from the Universty of Michigan's ICPSR, you might not have seen how elaborate this can get. Here's a link to a SAS program that reads in an ascii dataset. It has thousands of lables: http://lark.cc.ukans.edu/~pauljohn/sa2684.gz (This is a famous one, the American National Election Study) Netscape unzips this and shows it as text on the screen. A program like SAS or SPSS will use these lables to beautify frequencies and such, and I've not heard much in the R group about it, and I just wondered if you do ever talk about it.
Because it's no big deal. Those are factor levels. R has factors. Whether they get exported from SAS and converted by read.xpt I can't say.
Preserving value labels from SAS datasets is not as easy as it should be. SAS value labels are not part of the dataset, but are kept in a separate file called a format catalogue. The XPORT engine does not work with SAS catalogues, so you need to convert the format catalogue to a SAS database. You can do this with the cntlout option in PROC FORMAT. [Conversely the cntlin option creates a format catalogue from a database.] We use a program called Stat/Transfer to convert between different file formats. Recent versions of Stat/Transfer will preserve SAS value labels if you supply a format dataset. [It doesn't support R, but you can get from SAS to R via Stata]. I suppose that you could get read.xport to work the same way ... SAS value labels are not quite the same as S factor labels since the mapping from values to labels may be many-to-one. For example, you can categorize a continuous variable by supplying ranges of values to be given the same label. The variable is then treated like a categorical variable in tabulations, etc. but the underlying values are preserved in the dataset and may be recovered by changing the format. Martyn -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._