Skip to content

Multibyte strings

4 messages · Dennis Fisher, David Winsemius, Peter Dalgaard

#
R 3.2.0
OS X

Colleagues,

Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN).  David Winsemius proposed downloading the source code and installing with the following command:
	install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source?)Th

That works and I am grateful to David for his recommendation.  However, the package fails on some of the many objects that I attempted to write with:
	write.xport

The error message was:
	Error in nchar(var) : invalid multibyte string 3157

One work-around would be to edit out multibyte strings.  Is there a simple way to find and replace them?  Or is there some other clever approach that bypasses the problem?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com
#
On Sep 25, 2015, at 2:23 PM, Dennis Fisher wrote:

            
Consider using traceback() to see what section of code is actually reporting?

Since the error reported in your earlier message indicated a problem with a particular word starting with DIARRH  and ending in ?????A. When I try to drop that unquoted into an R console line I get:
Error: unexpected input in "DIARRH?"

My word process tells me that little comma-like glyph is a cedilla.

However I'm not sure this is reproducible problem since I am unable to produce a similar error with the toy file that is built with the write.xport help page code:
x             y
1  1             a
2  2 DIARRH??????A
3 NA          <NA>
4 NA             *
'data.frame':	4 obs. of  2 variables:
 $ x: atomic  1 2 NA NA
  ..- attr(*, "SASformat")= chr "date7."
 $ y: Factor w/ 3 levels "*","a","DIARRH??????A": 2 3 NA 1
  ..- attr(*, "label")= chr "character variable"
 - attr(*, "label")= chr "Simple example"
 - attr(*, "SAStype")= chr "MYTYPE"
x             y
1  1             a
2  2 DIARRH??????A
3 NA          <NA>
4 NA             *
'data.frame':	4 obs. of  2 variables:
 $ x: atomic  1 2 NA NA
  ..- attr(*, "SASformat")= chr "date7."
 $ y: Factor w/ 3 levels "*","a","DIARRH??????A": 2 3 NA 1
  ..- attr(*, "label")= chr "\"DIARRH??????A\""
 - attr(*, "label")= chr "Simple example"
 - attr(*, "SAStype")= chr "MYTYPE"
On a Mac I have used the Zap Gremlins option in TextWrangler.app. It would change the spelling of words that were originally constructed using ligature characters.


Best of luck;
David.
David Winsemius
Alameda, CA, USA
#
Dennis,

The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird.

So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend).

By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though.

-pd

  
    
#
Peter
Thanks for the explanation. One further comment ? you wrote:
In fact, they do make such a request.  Here is the actual language received this week (and repeatedly in the past):
Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com