Skip to content

Problem with read.xport() from foreigh package (PR#7389)

1 message · Brian Ripley

#
The relevant part of the code is

 		    if(strlen(tmpchar) == 1 && IS_SASNA_CHAR(tmpchar[0]))

#define IS_SASNA_CHAR(c) ((c) == 0x5f || (c) == 0x2e || \
                           (0x41 <= (c) && (c) <= 0x5a))

which says that single-character fields containing ., _, A-Z are to be 
taken as missing values.  That is true of all single-character fields in 
this file.

Looking at the reference, it says

   Missing values are written out with the first byte (the exponent)
   indicating the proper missing values. All subsequent bytes are 0x00. The
   first byte is:

      type      byte
       ._       0x5f
       .        0x2e
       .A       0x41
       .B       0x42
          ....
       .Z       0x5a

which suggests this is intended to apply only to numeric records 
('exponent'), whereas R applies it only to character records.  Elsewhere I 
found

   SAS stores 'missing' data depending on the variables data type:
   character or numeric. A 'missing' character value is represented as a
   blank (' ') or null (''). A 'missing' numeric value is represented as a
   single dot (.). SAS allows you to differentiate between values that are
   missing for different reasons. For example, in survey research a
   question may not be answered because the respondent refuses to answer or
   because they don't know the answer. SAS has a range of missing values to
   cover this case. These special missing values are for numerics only and
   are the letters of the alphabet preceded by the single dot .A thru .Z

which seems to confirm this.

Does anyone know for sure?
On Wed, 24 Nov 2004, Ruskin Chow wrote: