Using unicode from C interface of R - R-devel

Tue, Jan 21, 2014 2:41 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140121/2b0d187b/attachment.pl>

Duncan Murdoch

Tue, Jan 21, 2014 4:08 PM #

On 14-01-21 5:41 PM, Sandip Nandi wrote:

There are a number of encodings for Unicode.  Most Unix systems use 
UTF-8, Windows uses UTF-16 for some things, etc.

If your string is known to be in UTF-8 that's easiest:  just use 
mkCharCE instead of mkChar, as described in Writing R Extensions.  If it 
is in UTF-16 you might have more trouble because of possible embedded 0 
bytes.  Translate to UTF-8 first using C facilities like 
WideCharToMultibyte.

Duncan Murdoch

Brian Ripley

Tue, Jan 21, 2014 9:14 PM #

On 22/01/2014 00:08, Duncan Murdoch wrote:

Which is Windows-only (and 'wide char' differs by platform, including if 
it is known to be any Unicode encoding)   All platforms have Riconv: see 
'Writing R Extensions'. C11 has other ways to do this, but they are not 
widely implemented.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Sandip Nandi

Tue, Jan 21, 2014 9:48 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140121/2a5c2e24/attachment.pl>