Skip to content

Using unicode from C interface of R

4 messages · Duncan Murdoch, Brian Ripley, Sandip Nandi

#
On 14-01-21 5:41 PM, Sandip Nandi wrote:
There are a number of encodings for Unicode.  Most Unix systems use 
UTF-8, Windows uses UTF-16 for some things, etc.

If your string is known to be in UTF-8 that's easiest:  just use 
mkCharCE instead of mkChar, as described in Writing R Extensions.  If it 
is in UTF-16 you might have more trouble because of possible embedded 0 
bytes.  Translate to UTF-8 first using C facilities like 
WideCharToMultibyte.

Duncan Murdoch
#
On 22/01/2014 00:08, Duncan Murdoch wrote:
Which is Windows-only (and 'wide char' differs by platform, including if 
it is known to be any Unicode encoding)   All platforms have Riconv: see 
'Writing R Extensions'. C11 has other ways to do this, but they are not 
widely implemented.