Skip to content

Inefficiency of SAS Programming

6 messages · Terry Therneau, John Sorkin, Frank E Harrell Jr

#
Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not damm 
a language.  
 
  I found almost all of the "improvements" to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
    a. Everyone, even those of you with no SAS backround whatsoever, immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
    b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  
    
  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and ....  with no overall blueprint.  It is much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was "interesting" 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book "The New S Language", where Becker et 
al had the wisdom to scrap the macro processor.  
 
  	Terry Therneau
#
Terry Therneau wrote:
If I were writing S code for this it would be dramatically different.  I 
would try to be efficient and elegant but would need to remember to be a 
teacher at the same time.  For example this kind of recode is super 
efficient and quick to program but would need good comments or a 
handbook to all of my code:  c(cat=1, dog=2, giraffe=3)[animal]
But I think the code is quite intuitive once you have used that 
construct once.

There also a lot of factoring of code that could be done as others have 
pointed out.
Well put.  I am amazed there hasn't been a revolt among SAS users 
decades ago.  The S approach is also easier to debug one line at a time.

Cheers,
Frank

  
    
#
Terry's remarks (see below) are well received however, I take issue with one part of his comments. As a long time programmer (in both "statistical" programming languages and "traditional" programming languages), I miss the ability to write native-languages in R. While macros can make for difficult to read code, when used properly, they can also make flexible code that, if properly written (including good documentation, which should be a part of any code) can be easy to read.

Finally, everyone must remember that SAS code can be difficult to understand or "inefficient" just as R code can be difficult to understand or "inefficient". In the end, both programming systems have their advantages and disadvantage. No programming language is perfect. It is not fair, nor correct to damn one or the other. Accept the fact that some things are more easily and more clearly done in one language, other things are more clearly and more easily done in another language.  Let's move on to more important issues, viz. improving R so it is as good as it possibly can be.
John  

  

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not damm 
a language.  
 
  I found almost all of the "improvements" to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
    a. Everyone, even those of you with no SAS backround whatsoever, immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
    b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  
    
  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and ....  with no overall blueprint.  It is much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was "interesting" 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book "The New S Language", where Becker et 
al had the wisdom to scrap the macro processor.  
 
  	Terry Therneau

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
#
John Sorkin wrote:
Nice points John.  My only response is that I learned SAS in 1969 and 
used it intensively until 1991.  I wrote some of the first 
user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, 
PHGLM) and wrote extensively in the macro language.  After using S-Plus 
for only one month my productivity was far ahead of my productivity 
using SAS.

Frank
#
Frank,
A programming language's efficience is a function of several items, including what you are trying to program. Without using SAS proc IML, I have found that it is more efficient to code algorithms (e.g. a least squares linear regression) using R than SAS; we all know that matrix notation leads to more compact syntax than can be had when using non-matrix notation and R implements matrix notation. On the other hand, searching, sub-setting, merging etc. can a times be coded more efficiently, more easily, and in a more easily understood fashion is SAS. I am sure you people who use SAS to set up their datasets and then use R when they are developing an algorithm. 

Just as French may be a better language to express love, Italian a better language in which to write opera, and English the most efficient language for communication (at least for the last 50 years), so too do both R and SAS have a place in the larger world.
John     

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
John Sorkin wrote:
Nice points John.  My only response is that I learned SAS in 1969 and 
used it intensively until 1991.  I wrote some of the first 
user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, 
PHGLM) and wrote extensively in the macro language.  After using S-Plus 
for only one month my productivity was far ahead of my productivity 
using SAS.

Frank
#
John Sorkin wrote:
John I'll have to strongly disagree with most of your statement about 
data manipulation.  R is far more powerful, easier to debug dynamically, 
and concise for merging, reshaping, recoding, etc.  But I agree on the 
"easily understood" portion of your statement.

Cheers
Frank