Skip to content
Prev 55177 / 63424 Next

[Bug report] Chinese characters are not handled correctly in Rterm for Windows

Thanks for the update. I believe I've fixed a part of the problem you 
have reported, the crash while entering Chinese characters to the 
console (e.g. via Pinyin, the error message about invalid multibyte 
character in mbcs_get_next). The fix is in R-devel 74693 - Windows 
function ReadConsoleInputA no longer works with multibyte characters (it 
is not documented, probably a Windows bug, according to reports online 
this problem exists since Windows 8, but I only reproduced/tested in 
Windows 10). Could you please verify the crash is no longer happening on 
your system?

Re the other problem, Chinese characters not being displayed. I found 
this is caused by R calling setlocale(LC_CTYPE, *). Setting this to 
"Chinese" and variants (code page 936) causes the problem, but running 
in the "C" locale as per default works fine. This is easily reproduced 
by an external program below - when setlocale() is called, the Chinese 
character disappears from the output. A workaround is to run R with 
environment variable LC_CTYPE=C. Could you please verify the printed 
characters are ok with this setting? Would you have an explanation for 
this behavior? It seems a bit odd - why would the CRT remove characters 
valid in the console code page, when both the console code page and the 
"setlocale" code page are 936.

Thanks
Tomas

 ??? #include <stdio.h>
 ??? #include <locale.h>
 ??? int main(int argc, char **argv) {
 ??????? //if (!setlocale(LC_CTYPE, "Chinese")) fprintf(stderr, 
"setlocale failed\n");
 ??????? int chars[] = { 67, 196, 227, 68 };
 ??????? for(int i = 0; i < 4; i++) fputc(chars[i], stdout);
 ??????? fprintf(stdout, "\n");
 ??????? return 0;
 ??? }
On 04/28/2018 04:53 PM, Azure wrote: