Dear all, Is there an R equivalent to SAS's proc format? Best regards J. Lamack _________________________________________________________________ O Windows Live Spaces ? seu espa?o na internet com fotos (500 por m?s), blog e agora com rede social http://spaces.live.com/
R and SAS proc format
10 messages · lamack lamack, Ulrike Grömping, John Kane +4 more
lamack lamack wrote:
Dear all, Is there an R equivalent to SAS's proc format? Best regards J. Lamack
Fortunately not. SAS is one of the few large systems that does not implicitly support value labels and that separates label information from the database [I can't count the number of times someone has sent me a SAS dataset and forgotten to send the PROC FORMAT value labels]. See the factor function for information about how R does this. Frank Harrell
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
The down side to R's factor solution: The numerical values of factors are always 1 to number of levels. Thus, it can be tough and requires great care to work with studies that have both numerical values different from this and value labels. This situation is currently not well-supported by R. Regards, Ulrike P.S.: I fully agree with Frank regarding the annoyance one sometimes encounters with formats in SAS!
lamack lamack wrote:
Dear all, Is there an R equivalent to SAS's proc format? Best regards J. Lamack
_________________________________________________________________ O Windows Live Spaces ? seu espa?o na internet com fotos (500 por m?s), blog e agora com rede social http://spaces.live.com/ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/R-and-SAS-proc-format-tf3357624.html#a9340323 Sent from the R help mailing list archive at Nabble.com.
Ulrike Gr?mping wrote:
The down side to R's factor solution: The numerical values of factors are always 1 to number of levels. Thus, it can be tough and requires great care to work with studies that have both numerical values different from this and value labels. This situation is currently not well-supported by R.
You can add an attribute to a variable. In the sas.get function in the Hmisc package for example, when importing SAS variables that have PROC FORMAT value labels, an attribute 'sas.codes' keeps the original codes; these can be retrieved using sas.codes(variable name). This could be done outside the SAS import context also. Frank
Regards, Ulrike P.S.: I fully agree with Frank regarding the annoyance one sometimes encounters with formats in SAS! lamack lamack wrote:
Dear all, Is there an R equivalent to SAS's proc format? Best regards J. Lamack
_________________________________________________________________ O Windows Live Spaces ? seu espa?o na internet com fotos (500 por m?s), blog e agora com rede social http://spaces.live.com/ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
----- Original Message ----- From: "John Kane" <jrkrideau at yahoo.ca> To: "lamack lamack" <lamac_k at hotmail.com>; <R-help at stat.math.ethz.ch> Sent: Tuesday, March 06, 2007 2:13 PM Subject: Re: [R] R and SAS proc format
--- lamack lamack <lamac_k at hotmail.com> wrote:
Dear all, Is there an R equivalent to SAS's proc format?
What does the SAS PROC FORMAT do?
It formats or reformats data in the SAS system.
It looks this:
proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
out';
data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
proc print; run;
And produces this:
Obs i kanevar
1 1 A
2 2 B
3 3 C
4 4 X
5 5 Throw me out
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
But it is more robust than what is shown here.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070307/9ca67ca6/attachment.pl
Ulrike Gr?mping wrote:
The down side to R's factor solution:
The numerical values of factors are always 1 to number of levels. Thus, it
can be tough and requires great care to work with studies that have both
numerical values different from this and value labels. This situation is
currently not well-supported by R.
Regards, Ulrike
P.S.: I fully agree with Frank regarding the annoyance one sometimes
encounters with formats in SAS!
> You can add an attribute to a variable. In the sas.get function in the > Hmisc package for example, when importing SAS variables that have PROC > FORMAT value labels, an attribute 'sas.codes' keeps the original codes; > these can be retrieved using sas.codes(variable name). This could be > done outside the SAS import context also. > > Frank > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt
University Frank, are these attributes preserved when merging or subsetting a data frame? Are they used in R packages other than Hmisc and Design (e.g. in a simple table request)?
no; would need to add functions like those that are used by the Hmisc label or impute functions. And they are not used outside Hmisc/Design. In fact I have little need for them as I always find the final labels as the key to analysis.
If this is the case, my wishlist items 8658 and 8659 (http://bugs.r-project.org/cgi-bin/R/wishlist?id=8658;user=guest, http://bugs.r-project.org/cgi-bin/R/wishlist?id=8659;user=guest) can be closed. Otherwise, I maintain the opinion that there are workarounds but that R is not satisfactorily able to handle this type of data.
R gives the framework for doing this elegantly but the user has an overhead of implementing new methods for such attributes. Cheers Frank
Regards, Ulrike *------- End of Original Message -------*
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Jason Barnhart wrote:
----- Original Message ----- From: "John Kane" <jrkrideau at yahoo.ca> To: "lamack lamack" <lamac_k at hotmail.com>; <R-help at stat.math.ethz.ch> Sent: Tuesday, March 06, 2007 2:13 PM Subject: Re: [R] R and SAS proc format
--- lamack lamack <lamac_k at hotmail.com> wrote:
Dear all, Is there an R equivalent to SAS's proc
format?
What does the SAS PROC FORMAT do?
It formats or reformats data in the SAS system.
Slightly more precisely: It creates user-defined formats, which are subsequently associated with variables and used for reading, printing, tabulating, and analyzing data. It is akin to R's factor() constructions, but not quite. For one thing, SAS's formats are separate entities - same format can be used for many variables, whereas R's factors have the formatting coded as a part of the object. For related reasons, a variable in SAS can have more distinct values than there are value labesl for, etc.
It looks this:
proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
out';
data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
proc print; run;
And produces this:
Obs i kanevar
1 1 A
2 2 B
3 3 C
4 4 X
5 5 Throw me out
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
But it is more robust than what is shown here.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On 3/7/07, Peter Dalgaard <P.Dalgaard at biostat.ku.dk> wrote:
Jason Barnhart wrote:
----- Original Message ----- From: "John Kane" <jrkrideau at yahoo.ca> To: "lamack lamack" <lamac_k at hotmail.com>; <R-help at stat.math.ethz.ch> Sent: Tuesday, March 06, 2007 2:13 PM Subject: Re: [R] R and SAS proc format
--- lamack lamack <lamac_k at hotmail.com> wrote:
Dear all, Is there an R equivalent to SAS's proc format?
What does the SAS PROC FORMAT do?
It formats or reformats data in the SAS system.
Slightly more precisely: It creates user-defined formats, which are subsequently associated with variables and used for reading, printing, tabulating, and analyzing data. It is akin to R's factor() constructions, but not quite. For one thing, SAS's formats are separate entities - same format can be used for many variables, whereas R's factors have the formatting coded as a part of the object. For related reasons, a variable in SAS can have more distinct values than there are value labesl for, etc.
It looks this:
proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
out';
data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
proc print; run;
And produces this:
Obs i kanevar
1 1 A
2 2 B
3 3 C
4 4 X
5 5 Throw me out
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
But it is more robust than what is shown here.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Also, SAS formats are used as a (somewhat cumbersome) replacement for "dictionary" data structures. Starting from SAS 9.1 (I believe), "hash tables" can be used within data steps for the same purpose (albeit still cumbersome). In this regard, not only formats but also lists could be a replacement for them. They can be used as a way to get key-value mappings. These key-value mappings (I mean, these kind of data structures) are very handy tools. I have used both factors and lists for some kind of "ad hoc" replacement for these data structures. Hasn't anybody considered the posibility of having these data structures implemented in R in a much python-like or java-like touch and feel? Regards, Carlos J. Gil Bellosta http://www.datanalytics.com