Skip to content

Good practice for database with utf-8 string in package

3 messages · Marc Girondot, Bert Gunter, Jeff Newmiller

#
Hello everyone,

I am a little bit stucked on the problem to include a database with 
utf-8 string in a package. When I submit it to CRAN, it reports NOTES 
for several Unix system and I try to find a solution (if it exists) to 
not have these NOTES.

The database has references and some names have non ASCII characters.

* First I don't agree at all with the solution proposed here:

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues

"First, consider carefully if you really need non-ASCIItext."

If a language has non ASCII characters, it is not just to make the 
writting nicer of more complex, it is because it changes the prononciation.

* Then I try to find solution to not have these NOTES.

For example, here is a reference with utf-8 characters
[1] Hern?ndez-Montoya, V., P?ez, V.P. & Ceballos, C.P. (2017) Effects of 
temperature on sex determination and embryonic development in the 
red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and 
Biology 16, 164-171.

When I convert the characters into unicode, I get indeed only ASCII 
characters. Perfect.
[1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P. 
(2017) Effects of temperature on sex determination and embryonic 
development in the red-footed tortoise, Chelonoidis carbonarius. 
Chelonian Conservation and Biology 16, 164-171."

Then I have no NOTES when I checked the package with database in UNIX... 
but how can I print the reference back with original characters ?

Thanks a lot to point me to best practices to include databases with 
non-ASCII characters and not have NOTES while submitted package to CRAN.

Marc
#
This should not be posted here. Post on the R-package-devel list instead.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 16, 2021 at 9:13 AM Marc Girondot via R-help
<r-help at r-project.org> wrote:
#
Agree with Bert per your stated problem, but want to point out that you don't have control over the locale in which your users will be trying to display the encoded strings in your data. I am no expert in this, but you will need to become one in order to understand your own problem and any solutions you are given in r-package-devel. You will likely benefit from reading Kevin Ushey's writeup: https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/
On September 16, 2021 9:17:05 AM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote: