Skip to content

Bug in 2.4.0 Windows menu setup (PR#9277)

8 messages · Ei-ji Nakama, Duncan Murdoch, Brian Ripley +1 more

#
I've tracked down where this is occurring, but I don't know how to fix 
it.  Here's a summary:

If the language in Windows is set to simplified Chinese (i.e. Chinese 
(PRC)) and message translations are installed, then on startup Rgui 
crashes when it tries to install the console popup menu.  The crash 
comes when it gets an error trying to do a conversion using mbrtowc, and 
tries to report it using error(); but the R symbol table is needed for 
that, and it hasn't been set up yet.

I'm not sure which menu entry causes the crash, but I think it's not the 
first, so conceivably this is caused by an error in one of the 
translation files.  Indeed, setting the language to Chinese (Taiwan) works.

I don't know how to debug the translation files, so I'm going to have to 
leave this one for now.  I think Brian Ripley is the only one who 
understands all the details of what goes on in the translations, and 
he's away until Oct 9.

Duncan Murdoch
On 10/3/2006 9:00 PM, ronggui wrote:
#
I do not understand Chinese, but recognize kanji.
RGui-zh_CN.po is written in utf-8, but charset=CP936 wrote.

  perl -p -i -e 's#charset=CP936#charset=utf-8#' RGui-zh_CN.po
  msgfmt -o RGui.mo RGui-zh_CN.po

2006/10/5, murdoch at stats.uwo.ca <murdoch at stats.uwo.ca>:

  
    
#
On 2006-10-5 8:06, Ei-ji Nakama wrote:
Thanks!!  That does fix the error, at least on my system.  I'll commit 
the change to R-devel and R-patched.

Duncan Murdoch
1 day later
#
Duncan Murdoch wrote:
Hmm, I do understand Chinese, and I can confirm that the content
of RGui-zh_CN.po in R 2.4 is in utf-8 rather than CP936.

I can also confirm that CP950(big5) for RGui-zh_TW.po is correct, and
CP932(shift-JIS) for  RGui-ja.po is also correct. (so you'll need to 
find some korean to verify CP949 for RGui-ko.po).

However, the fix is slightly "asymmetric". Out of ru, zh_CN, zh_TW,
ja, ko, only ru in R-2.4.0/po/*.po is in localised encoding,
(the others 4 in UTF-8), whereas RGui-*.po, after the fix, all
are in localised encoding except RGui-zh_CN.po .

I would propose correcting the encoding of the *content*, rather
than the charset tag, so that Rgui-* all uses localised ones (CP932, 
CP936, CP949, CP950). That should be better for older windows...

Just my two cents/pennies...

Hin-Tak Leung
#
On 10/6/2006 1:35 PM, Hin-Tak Leung wrote:
I did try that, but iconv didn't want to convert the file from UTF-8 to 
CP936.  I've no idea why not.

In any case, those files only need to be readable by the translation 
teams, not by end-users, so I don't think the asymmetry matters:  if a 
translator finds it easy to work in UTF-8 that's fine for R, as long as 
it is correctly recorded.

Duncan Murdoch
#
iconv  -f  utf-8 -t cp936 RGui-zh_CN.po > RGui-zh_CN.po.cp936
 iconv: illegal input sequence at position 19303

 iconv -c -f  utf-8 -t cp936 RGui-zh_CN.po > RGui-zh_CN.po.cp936
      ^^
 iconv -f cp936 -t utf-8 RGui-zh_CN.po.cp936 > RGui-zh_CN.po.cp936utf8
 diff -uN  RGui-zh_CN.po   RGui-zh_CN.po.cp936utf8
@@ -852,7 +852,7 @@

 #: rui.c:1283 rui.c:1404
 msgid "menu + item is limited to 1000 bytes"
-msgstr "xxx"
+msgstr "xxx"

 grep -C1 "menu + item is limited to 1000 bytes" RGui-zh_CN.po

This should ask a translator for text of a part for a difference.
BTW, there is not a problem in GB18030.

2006/10/7, Duncan Murdoch <murdoch at stats.uwo.ca>:

  
    
5 days later
#
On Fri, 6 Oct 2006, Duncan Murdoch wrote:

            
That was the intention, but the translator marked it incorrectly, and it 
appeared to be valid CP936 so not picked up by me.
Not really: the file does need to be convertible to the target encoding, 
and on Windows that is e.g. CP936.  That is why we have the RGui files in 
the native encoding: for all the other files we prefer UTF-8 or Latin-1.

I need to sort out with the translator a valid CP936 version of that file: 
it contains a character that is not in CP936.
#
Prof Brian Ripley wrote:
<snipped>
<snipped>
Hi. There is one invalid character at line 855. Here is the patch,
the corrected utf-8 version (against 2.4.0, without changing the
charset tag...), and the correctly converted CP936
(a.k.a. GB2312 or EUC-CN) version.

I have checked round-trip conversion with CP936<->utf-8 with both
iconv and Ken Lunde's cjkvconv.pl .

You can either apply the patch, or just replace what is in the R
source with the EUC-CN version. (the latter would be my preferred
solution, but since it is sort of "unnessarily large change",
I'll live with the former...).

Hin-Tak Leung


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: RGui-zh_CN.po.euc-cn.corrected
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20061012/ce62e689/attachment-0006.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: RGui-zh_CN.po.utf8.corrected
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20061012/ce62e689/attachment-0007.pl