Skip to content
Back to formatted view

Raw Message

Message-ID: <x2hcxrrpry.fsf@turmalin.kubism.ku.dk>
Date: 2006-10-26T16:43:45Z
From: Peter Dalgaard
Subject: Error: invalid multibyte string
In-Reply-To: <Pine.LNX.4.64.0610260824520.9367@homer23.u.washington.edu>

Thomas Lumley <tlumley at u.washington.edu> writes:

> On Thu, 26 Oct 2006, Henrik Bengtsson wrote:
> 
> > I'm observing the following on different platforms:
> >
> >> parse(text='"\\x7F"')
> > expression("\177")
> >> parse(text='"\\x80"')
> > Error: invalid multibyte string
> 
> Yes. It's an invalid multibyte string.  In UTF-8 a single byte is a valid 
> character string only if it is below x80, so x7F is fine but x80 is not. 
> In fact x80 is not the leading byte of any valid UTF-8 character.
> 
> You have to work out what the Unicode code point is for whatever character 
> you were expecting to be x80 and convert that to UTF-8.
> 
> I'm surprised that one of your UTF-8 machines worked -- I don't think it 
> should.

Interestingly, we can parse, but not print or deparse:

> x<-parse(text='"\\x80"')
> x
Error: invalid multibyte string
> z <- deparse(x)
Error in deparse(x) : invalid multibyte string
> cat(x[[1]])
?>

(the last line has a funny little cedilla-like symbol in pos 1)

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907