An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131213/5b1b565f/attachment.pl>
charToRaw("Œ") is not 8C in R console
4 messages · 水静流深, Brian Ripley, Peter Dalgaard
On 13/12/2013 07:03, ???? wrote:
in http://www.ascii-code.com/, you can see the the hex value of ?? is 8C,
I don't see that: that is two characters and they are C5 and 92 in that table. 8C is a AE ligature, there. And what the 'hex value' is depends on the locale: see the preamble of that table (which seems to assume everyone uses CP1252): you have not stated yours.
why in my R console ?
charToRaw("??")
[1] c5 92
is not 8C ?
Because R is better at looking up hex values than you are.
I get
> charToRaw("??")
[1] c3 85 e2 80 99
in UTF-8 (as will almost everyone not using Windows).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 13/12/2013 07:59, Prof Brian Ripley wrote:
On 13/12/2013 07:03, ???? wrote:
in http://www.ascii-code.com/, you can see the the hex value of ?? is 8C,
I don't see that: that is two characters and they are C5 and 92 in that table. 8C is a AE ligature, there.
Typo: OE as in your subject line.
And what the 'hex value' is depends on the locale: see the preamble of that table (which seems to assume everyone uses CP1252): you have not stated yours.
why in my R console ?
charToRaw("??")
[1] c5 92
is not 8C ?
Because R is better at looking up hex values than you are. I get
> charToRaw("??")
[1] c3 85 e2 80 99 in UTF-8 (as will almost everyone not using Windows).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 13 Dec 2013, at 08:03 , ???? <1248283536 at qq.com> wrote:
in http://www.ascii-code.com/, you can see the the hex value of ? is 8C,
(Looks like Brian got his version mangled in transmission.) Anything above 7F is not ASCII. Various "8-bit extensions" put various non-ASCII characters at various places in the range 80-FF. Your reference shows the Latin-1 encoding which covers the Western European languages. That was useful for a while [*], until the West and the East began talking to eachother and found that the other party's documents were putting different characters in the same places of different encodings. UTF-8 uses multibyte sequences like c5 92 to represent extra characters, which allows you to have more than 128 of them. http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256 http://www.joelonsoftware.com/articles/Unicode.html -pd [*] A short while, actually, because it was preceded by another encoding mess known as IBM Code Pages. Famously, in this country, IBM computers (and many 3rd party printers!) shipped with a code page missing the O-slash Danish character which got printed as "cent"/"Yen"!
why in my R console ?
charToRaw("?")
[1] c5 92
is not 8C ?
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com