Skip to content
Back to formatted view

Raw Message

Message-ID: <CABFfbXuU-h9CdQ5pBTq6LBexyShydVhNOjtEQhy+Y43h5+mrVg@mail.gmail.com>
Date: 2014-12-11T20:24:07Z
From: Jeroen Ooms
Subject: [Rcpp-devel] String encoding (UTF-8 conversion)

I'm interfacing a c++ library which assumes strings are UTF-8. However
strings from R can have various encodings. It's not clear to me how I
need to account for that in Rcpp. For example:

// [[Rcpp::export]]
std::string echo(std::string src){
  return src;
}

This program does not work on windows for non-ascii strings:

> test = "??"
> echo(test)
[1] "? ????

In C programs I always use translateCharUTF8 on all input to make sure
it is UTF8 before I start working with it:

  translateCharUTF8(STRING_ELT(x, i));

Similarly on the output, I explicitly set the encoding to let R know
it this is UTF8:

  SET_STRING_ELT(out, 0, mkCharCE(olds, CE_UTF8));

This ensures that code works across platforms and locales. How do we
go about this in Rcpp?