Skip to content

read.xport and lookup.xport in foreign (PR#2385)

2 messages · Frank E Harrell Jr, Peter Dalgaard

#
Under
            
platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    6.1              
year     2002             
month    11               
day      01               
language R                

and using foreign 0.5-8 I am encountering errors when using read.xport.  Here's code for producing SAS transport files for testing:

libname x SASV5XPT "test.xpt";
libname y SASV5XPT "test2.xpt";
PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN;
PROC FORMAT CNTLOUT=format;RUN;
data test;
LENGTH race 3 age 4;
age=30; label age="Age at Beginning of Study";
race=2;
d1='3mar2002'd ;
dt1='3mar2002 9:31:02'dt;
t1='11:13:45't;
output;

age=31;
race=4;
d1='3jun2002'd ;
dt1='3jun2002 9:42:07'dt;
t1='11:14:13't;
output;
format d1 mmddyy10. dt1 datetime. t1 time. race race.;
run;
PROC COPY IN=work OUT=x;SELECT test;RUN;
PROC COPY IN=work OUT=y;SELECT test format;RUN;

SAS output:

NOTE: Copying WORK.TEST to X.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set X.TEST has 2 observations and 5 variables.
NOTE: PROCEDURE COPY used:
      real time           1.52 seconds
      cpu time            0.04 seconds
      
NOTE: Copying WORK.TEST to Y.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set Y.TEST has 2 observations and 5 variables.
NOTE: Copying WORK.FORMAT to Y.FORMAT (memtype=DATA).
NOTE: There were 3 observations read from the data set WORK.FORMAT.
NOTE: The data set Y.FORMAT has 3 observations and 21 variables.
NOTE: PROCEDURE COPY used:

R results:
RACE      AGE    D1        DT1    T1
1 2.000063 30.00000 15402 1330767062 40425
2 4.000063 31.00000 15494 1338716527 40453

Note the corruption of RACE (a variable having a SAS length of 3 bytes).
RACE           AGE            D1           DT1            T1
1   2.000063e+00  3.000000e+01  1.540200e+04  1.330767e+09  4.042500e+04
2   4.000063e+00  3.100000e+01  1.549400e+04  1.338717e+09  4.045300e+04
3   3.687825e-40  3.687825e-40  3.687825e-40  3.687896e-40  5.962240e+20
...
124 3.835229e-93  6.434447e-86            NA  3.687825e-40  3.687825e-40

Note corrupted data when trying to read a SAS transport file containing more than one SAS dataset.  According to the documentation, read.xport is supposed to work in this case and is supposed to return a list of data frames.
[1] "TEST"

Note the inclusion of only one of the 2 datasets.


Also I would greatly benefit from having lookup.xport return all of the SAS variable attributes, especially variable label and format name.  I could then write a little function for the community that makes read.xport as comprehensive as read.spss in terms of creating factor variables and variable labels, if the user exports the PROC CONTENTS CNTLOUT= dataset.

Thanks.
1 day later
#
fharrell@virginia.edu writes:

[How to create an xpt file that R cannot read...]
If anyone is interested in trying to figure this stuff out, it would
be most welcome (information on the file format can be obtained via
the link http://www.wotsit.org/download.asp?f=sas). To save you the
trouble, here's the inverse of Frank's code, i.e., how to read the
stuff back into SAS:

libname x SASV5XPT "test.xpt";
libname y SASV5XPT "test2.xpt";
proc format cntlin=y.format;
proc contents data=x.test;
proc contents data=y.test;
proc contents data=y.format;
proc print data=x.test;
proc print data=y.test;
proc print data=y.format;

Notice in particular that nothing works without the proc format line,
SAS can't read test.xpt without somehow being told what the RACE
format is. One possibly relevant oddity is that SAS seems to claim
that RACE has length 4, not 3, in the contents listing.

(Of course, if you haven't already realized: Once we know how to
extract SAS format names and interpret user-supplied formats, people
are going to want us to be able to interpret standard formats like
DATETIME. as well....)