Skip to content

build package with unicode (farsi) strings

5 messages · Faridedin Cheraghi, Thierry Onkelinx, Ista Zahn +2 more

#
Hi,

I have a R script file with Persian letters in it defined as a variable:

#' @export
letters_fa <- c('???','?','?','?','?','?','?','?','?','?','?','?')

I have specified the encoding field in my DESCRIPTION file of my package.

...
Encoding: UTF-8
...

I also included Sys.setlocale(locale="Persian") in my .RProfile, so it is
executed when RCMD is called. However, after a BUILD and INSTALL, when I
access the variable from the package, the characters are not printed
correctly:
[1] "<d8><a7><d9><84><d9><81>" "<d8><a8>"                 "<d9><be>"
           "<d8><aa>"                 "<d8><ab>"
 [6] "<d8><ac>"                 "<da><86>"                 "<d8><ad>"
           "<d8><ae>"                 "<d8><b1>"
[11] "<d8><b2>"                 "<d8><af>"


thanks
Farid
2 days later
#
Dear Farid,

Try using the ASCII notation. letters_fa <- c("\u0627", "\u0641"). The full
code table is available at https://www.utf8-chartable.de

Best regards,



ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

2018-08-28 7:17 GMT+02:00 Faridedin Cheraghi <faridcher at gmail.com>:

  
  
#
On Thu, Aug 30, 2018 at 3:11 AM Thierry Onkelinx
<thierry.onkelinx at inbo.be> wrote:
... as recommend in the manual:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues

Best,
Ista

The full
#
On Thu, Aug 30, 2018 at 2:11 AM Thierry Onkelinx
<thierry.onkelinx at inbo.be> wrote:
It's a little easier to do this with code:

letters_fa <- c('???','?','?','?','?','?','?','?','?','?','?','?')
writeLines(stringi::stri_escape_unicode(letters_fa))
#> \u0627\u0644\u0641
#> \u0628
#> \u067e
#> \u062a
#> \u062b
#> \u062c
#> \u0686
#> \u062d
#> \u062e
#> \u0631
#> \u0632
#> \u062f

Hadley
#
Thank you all for your valuable insights. The most viable workaround is a modification to the Hadley?s line of code:



stringi::stri_escape_unicode(letters_fa) %>%

paste0("'",.,"'",collapse=',') %>%

paste0('c(',.,')')



which then, the output string could be easily copied and pasted without manual editing. However, imagine you had to do this process to all of your English strings that you write daily! It is not that much productive. Is it?



I think R deserves a better support for internationalization and I know this implies fundamental revisions to the code to avoid the unecessary conversion to a (OS) native locale; i.e. directly reading/writing as unicode.



Farid