Problem with read.xport() from foreigh package (PR#7389) - R-devel

Wed, Nov 24, 2004 8:40 AM #

The relevant part of the code is

 		    if(strlen(tmpchar) == 1 && IS_SASNA_CHAR(tmpchar[0]))

#define IS_SASNA_CHAR(c) ((c) == 0x5f || (c) == 0x2e || \
                           (0x41 <= (c) && (c) <= 0x5a))

which says that single-character fields containing ., _, A-Z are to be 
taken as missing values.  That is true of all single-character fields in 
this file.

Looking at the reference, it says

   Missing values are written out with the first byte (the exponent)
   indicating the proper missing values. All subsequent bytes are 0x00. The
   first byte is:

      type      byte
       ._       0x5f
       .        0x2e
       .A       0x41
       .B       0x42
          ....
       .Z       0x5a

which suggests this is intended to apply only to numeric records 
('exponent'), whereas R applies it only to character records.  Elsewhere I 
found

   SAS stores 'missing' data depending on the variables data type:
   character or numeric. A 'missing' character value is represented as a
   blank (' ') or null (''). A 'missing' numeric value is represented as a
   single dot (.). SAS allows you to differentiate between values that are
   missing for different reasons. For example, in survey research a
   question may not be answered because the respondent refuses to answer or
   because they don't know the answer. SAS has a range of missing values to
   cover this case. These special missing values are for numerics only and
   are the letters of the alphabet preceded by the single dot .A thru .Z

which seems to confirm this.

Does anyone know for sure?

On Wed, 24 Nov 2004, Ruskin Chow wrote:

Dear Prof. Ripley,

Thanks for your prompt reply. In fact when I converted one-char data fields
in SAS to two-char fields before generating the SAS transport format file,
the problem (that all single char fields become <NA>) disappears. The
problem is readily reproducible and I attach two sample data files in SAS
transport format, one with one-char data fields ("char1.xport") and the
other data file ("char2.xport") with the same data fields except that
one-char fields are converted to two-char fields (for example, see the
fields named "SEX", "GB" etc.). The R output text file ("lastsave.txt") is
also attached so that you can verify it easily at your end.

Even with the above, there is no conclusive evidence whether this problem is
actually (a) with SAS in generating the transport file, or (b) with R in the
read.xport() function. However, with the indication in the following
discussion thread:

http://maths.newcastle.edu.au/~rking/R/help/02a/3258.html

I tend to believe that R may have the answer.

Hope this helps and thanks again for your quick response.

Best regards,

Ruskin

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk]
Sent: Tuesday, November 23, 2004 7:02 PM
To: ruskin@iba.com.hk
Cc: R-bugs@biostat.ku.dk
Subject: Re: [Rd] Problem with read.xport() from foreigh package (PR#7389)

On Tue, 23 Nov 2004 ruskin@iba.com.hk wrote:

Full_Name: Ruskin Chow
Version: R 2.0.1
OS: Windows 2000
Submission from: (NULL) (203.169.154.66)


Data imported from SAS using read.xport() in package foreign are converted

to

<NA> when the SAS data field consists of character strings that are only

one

character long.

This is apparently a previously reported bug and perhaps fixed in some

platform

other than Windows (rw2001).Some discussion of the bug can be found from

the

following website:

https://stat.ethz.ch/pipermail/r-help/2002-April/019349.html

There is no Windows-specific version of read.xport(), so that change
happened everywhere: but that message does not say it was fixed nor which
change might have fixed it.  I am guessing the change is this one

2002-03-27  Douglas Bates  <bates@stat.wisc.edu>

        * src/SASxport.c (IS_SASNA_CHAR): Silly typo (0x4l, not 0x41)
        caught by Peter.

and that does not sound like the same thing.

I've downloaded the latest foreign package (foreign_0.8-1.zip) from CRAN

but it

doesn't seem to work.

  ^^^^^^^^^^^^^^^^^^^^

What does that mean?  It `works' on its test suite: none of the changes
since 0.8 are related to this question.  I don't believe there is an
example in the test suite with one-char strings, and we need an example to
reproduce what you are seeing.

So, please read the posting guide and provide us with a reproducible
example.

--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595