Skip to content

A question about the API mkchar()

2 messages · 王永智, Brian Ripley

#
Hi, Simon
 

Thanks for your elaborated instruction on mkCharCE.

 Concerning the UTF-8 Encoding, mkCharCE(X, CE_UTF8) is the correct way in parsing the Unicode string.

 However, I met another question:

 My program logic is intended to read the content of a text file r.tmp, which is encoded with UTF-8. After reading it, every line will be send to another C function ext_show(t const char** text, int* length, int* errLevel) for the further handle. Attached is the text file ?r.tmp?.

 I tried to use the following R code to accomplish the process:

checkoutput<-scan(?r.tmp?,

                       what='character',

                       blank.lines.skip=FALSE,

                       sep='\n',

                       skip=0,

                       quiet=TRUE,

                       encoding = ?unknown?)               

lines<-length(checkoutput)

print(checkoutput)

for (i in 1:lines)

 {

Inputstring = checkoutput[i]

out <- .C('ext_show',as.character(inputstring),

                                         as.integer(nchar(inputstring)),

                                         as.integer(err),

                                         PACKAGE="mypkg")

 }        

 

 

I don?t know why, if I typed the command in R GUI environment, the Japanese character can be shown correctly. Also, if I sink the inputstring into another text file, the content of this file also written correctly.

 But if I use the above code passing the inputstring into function ext_show, the string passed inputstring has been changed in the function ext_show ().

My current environment is WindowsXP, R 2.7.0, R encoding is "UTF-8":
[1] "UTF-8"
[1] "LC_COLLATE=Chinese_People's Republic of China.936;LC_CTYPE=Chinese_People's Republic of China.936;LC_MONETARY=Chinese_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese_People's Republic of China.936"


For current encoding is UTF-8, I don't think Chinese local will hinder the correct result. 

 The ext_show is defined as below:

    void ext_show(

        const char** text,

        int* length,

        int* errLevel)

        {

            *errLevel = LoadLib();

            int real_length = strlen(*text);

            if( LOAD_SUCCESS == *errLevel )

                *errLevel = ShowInScreen(*text, real_length);

        }

 I am new to the R programming, and not every familiar with the encoding handle in R, I suspect if it is necessary to convert encoding of the inputstring before passing to the function ext_show().

 Many Thanks!

Joey

 
?2008-10-28?"Simon Urbanek" <simon.urbanek at r-project.org> ???
#
1) 2.7.0 is rather old, and you were asked to update your R before 
posting.

2) No file was attached.  But how to handle encodings is in the 'R 
Internals' manual.  This is a tricky, advanced, topic in C-level R 
programming.  It is your responsibility, not ours, to get yourself up to 
the level of understanding required.  Sorry, but it is not reasonable to 
expect a personal tutorial in this forum.
On Mon, 3 Nov 2008, ??? wrote: