Skip to content
Prev 334362 / 398506 Next

charToRaw("Œ") is not 8C in R console

On 13 Dec 2013, at 08:03 , ???? <1248283536 at qq.com> wrote:

            
(Looks like Brian got his version mangled in transmission.)

Anything above 7F is not ASCII.

Various "8-bit extensions" put various non-ASCII characters at various places in the range 80-FF. Your reference shows the Latin-1 encoding which covers the Western European languages. That was useful for a while [*], until the West and the East began talking to eachother and found that the other party's documents were putting different characters in the same places of different encodings.

UTF-8 uses multibyte sequences like c5 92 to represent extra characters, which allows you to have more than 128 of them.

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256
http://www.joelonsoftware.com/articles/Unicode.html

-pd

[*] A short while, actually, because it was preceded by another encoding mess known as IBM Code Pages. Famously, in this country, IBM computers (and many 3rd party printers!) shipped with a code page missing the O-slash Danish character which got printed as "cent"/"Yen"!