Dear all,
I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build command is R --vanilla CMD build "myPackage" --no-manual --no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings --as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK
OK
* checking re-building of vignette outputs ... [22s] OK
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
UTF8 markdown vignette
15 messages · Yihui Xie, Duncan Murdoch, ONKELINX, Thierry
On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build command is R --vanilla CMD build "myPackage" --no-manual --no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings --as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK
OK
* checking re-building of vignette outputs ... [22s] OK
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Dear Duncan,
The UTF-8 characters aren't properly rendered in the pdf version of the vignette.
$?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is encoding = getOption("encoding"), which is "native.enc" on my system.
I'll post the question on an RStudio forum as well.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
Verzonden: dinsdag 9 december 2014 11:04
Aan: ONKELINX, Thierry; r-devel at r-project.org
Onderwerp: Re: [Rd] UTF8 markdown vignette
On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Disclaimer Bezoek onze website / Visit our
website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inb
o>
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
Dear Duncan, The UTF-8 characters aren't properly rendered in the pdf version of the vignette. $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
That looks as though the UTF-8 characters are being interpreted as Latin1 characters (or whatever your native encoding is on Windows) when read from the file. It is quite tricky to work with UTF-8 in R in Windows. I think Sweave does it properly, though there may be exceptions. The issue is that many character input routines assume characters start out in the native encoding. (There's also a translation that happens by default on output, but I don't think that's your problem.) So the way to debug this is to follow all of the I/O, and see where the misinterpretation happens. For vignettes, things are complicated, because R reads the file to determine which vignette engine to use, then the vignette engine reads it (perhaps more than once).
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
It sounds as though the render function needs a way to determine the
encoding from the file itself. Recent Sweave versions support the
declaration
%\VignetteEncoding{utf8}
as well as the older
\usepackage[utf8]{inputenc}
that you used. You might want to try that line as well. (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)
Duncan Murdoch
I'll post the question on an RStudio forum as well. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Verzonden: dinsdag 9 december 2014 11:04 Aan: ONKELINX, Thierry; r-devel at r-project.org Onderwerp: Re: [Rd] UTF8 markdown vignette On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Disclaimer Bezoek onze website / Visit our
website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inb
o>
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
A few things to clarify:
1. You do not necessarily have to keep the \usepackage{} line if you
use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in
its LaTeX template.
2. Perhaps the vignette engine in R has done something clever to
convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8}
instead of %\VignetteEncoding{utf8} to make sure it is a valid
encoding name, e.g.
'utf8' %in% iconvlist()
[1] FALSE
'UTF-8' %in% iconvlist()
[1] TRUE
'UTF8' %in% iconvlist()
[1] TRUE
BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and
I think it needs to be documented, since the old approach
\usepackage[enc]{inputenc} was basically a hack, which looks really
odd in non-LaTeX vignettes (e.g. HTML vignettes).
3. The default `encoding` argument of rmarkdown::render() is not
relevant here, even if its value is native.enc. When R build a
vignette, it tries to detect its encoding and pass it to the vignette
engine, so the default argument value may not be native.enc.
Lastly, the most important piece of information is missing in this
post: library(rmarkdown); sessionInfo(). There is not a minimal
reproducible example, either. Without these information, I can only
guess blindly.
BTW, you may also try HTML vignettes instead, which is much much
easier to get right than LaTeX in terms of character encodings.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
Dear Duncan, The UTF-8 characters aren't properly rendered in the pdf version of the vignette. $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
That looks as though the UTF-8 characters are being interpreted as Latin1 characters (or whatever your native encoding is on Windows) when read from the file. It is quite tricky to work with UTF-8 in R in Windows. I think Sweave does it properly, though there may be exceptions. The issue is that many character input routines assume characters start out in the native encoding. (There's also a translation that happens by default on output, but I don't think that's your problem.) So the way to debug this is to follow all of the I/O, and see where the misinterpretation happens. For vignettes, things are complicated, because R reads the file to determine which vignette engine to use, then the vignette engine reads it (perhaps more than once).
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
It sounds as though the render function needs a way to determine the
encoding from the file itself. Recent Sweave versions support the
declaration
%\VignetteEncoding{utf8}
as well as the older
\usepackage[utf8]{inputenc}
that you used. You might want to try that line as well. (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)
Duncan Murdoch
I'll post the question on an RStudio forum as well. Best regards, Thierry -----Oorspronkelijk bericht----- Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Verzonden: dinsdag 9 december 2014 11:04 Aan: ONKELINX, Thierry; r-devel at r-project.org Onderwerp: Re: [Rd] UTF8 markdown vignette On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
On 09/12/2014 11:13 AM, Yihui Xie wrote:
A few things to clarify:
1. You do not necessarily have to keep the \usepackage{} line if you
use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in
its LaTeX template.
2. Perhaps the vignette engine in R has done something clever to
convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8}
instead of %\VignetteEncoding{utf8} to make sure it is a valid
encoding name, e.g.
'utf8' %in% iconvlist()
[1] FALSE
'UTF-8' %in% iconvlist()
[1] TRUE
'UTF8' %in% iconvlist()
[1] TRUE
BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and
I think it needs to be documented, since the old approach
\usepackage[enc]{inputenc} was basically a hack, which looks really
odd in non-LaTeX vignettes (e.g. HTML vignettes).
Yes, "utf8" works; it will be sent to the vignette engine as "UTF-8".
I was surprised about the missing docs. The documented way to do this
is to use
%\SweaveUTF8
but the source says the recommended way is to use
%\VignetteEncoding{}
and it's certainly a little more engine-agnostic. I'll add something to the docs if Kurt doesn't get there first.
3. The default `encoding` argument of rmarkdown::render() is not relevant here, even if its value is native.enc. When R build a vignette, it tries to detect its encoding and pass it to the vignette engine, so the default argument value may not be native.enc. Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly. BTW, you may also try HTML vignettes instead, which is much much easier to get right than LaTeX in terms of character encodings.
Over the last while I've been writing an HTML vignette, and I really want to compliment Yihui and the other rmarkdown folks for doing a fantastic job with them. I haven't had to deal with encoding issues, but overall markdown + R + HTML is a very pleasant way to work. I just wish someone would implement reverse search ... :-). Duncan Murdoch
Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
Dear Duncan, The UTF-8 characters aren't properly rendered in the pdf version of the vignette. $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
That looks as though the UTF-8 characters are being interpreted as Latin1 characters (or whatever your native encoding is on Windows) when read from the file. It is quite tricky to work with UTF-8 in R in Windows. I think Sweave does it properly, though there may be exceptions. The issue is that many character input routines assume characters start out in the native encoding. (There's also a translation that happens by default on output, but I don't think that's your problem.) So the way to debug this is to follow all of the I/O, and see where the misinterpretation happens. For vignettes, things are complicated, because R reads the file to determine which vignette engine to use, then the vignette engine reads it (perhaps more than once).
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
It sounds as though the render function needs a way to determine the
encoding from the file itself. Recent Sweave versions support the
declaration
%\VignetteEncoding{utf8}
as well as the older
\usepackage[utf8]{inputenc}
that you used. You might want to try that line as well. (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)
Duncan Murdoch
I'll post the question on an RStudio forum as well. Best regards, Thierry -----Oorspronkelijk bericht----- Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Verzonden: dinsdag 9 december 2014 11:04 Aan: ONKELINX, Thierry; r-devel at r-project.org Onderwerp: Re: [Rd] UTF8 markdown vignette On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
Thanks for the kind words. Actually we have more ambitious plans than just reverse search :) Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Tue, Dec 9, 2014 at 11:18 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 09/12/2014 11:13 AM, Yihui Xie wrote:
Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly. BTW, you may also try HTML vignettes instead, which is much much easier to get right than LaTeX in terms of character encodings.
Over the last while I've been writing an HTML vignette, and I really want to compliment Yihui and the other rmarkdown folks for doing a fantastic job with them. I haven't had to deal with encoding issues, but overall markdown + R + HTML is a very pleasant way to work. I just wish someone would implement reverse search ... :-). Duncan Murdoch
Dear Yihui, I have created a reproducible example at https://github.com/ThierryO/utf8vignette The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. %\VignetteEncoding{UTF-8} did not solve the problem. I use rmarkdown_0.3.11 HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ________________________________________ Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name] Verzonden: dinsdag 9 december 2014 17:13 Aan: ONKELINX, Thierry CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette A few things to clarify: 1. You do not necessarily have to keep the \usepackage{} line if you use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in its LaTeX template. 2. Perhaps the vignette engine in R has done something clever to convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8} instead of %\VignetteEncoding{utf8} to make sure it is a valid encoding name, e.g.
'utf8' %in% iconvlist()
[1] FALSE
'UTF-8' %in% iconvlist()
[1] TRUE
'UTF8' %in% iconvlist()
[1] TRUE
BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and
I think it needs to be documented, since the old approach
\usepackage[enc]{inputenc} was basically a hack, which looks really
odd in non-LaTeX vignettes (e.g. HTML vignettes).
3. The default `encoding` argument of rmarkdown::render() is not
relevant here, even if its value is native.enc. When R build a
vignette, it tries to detect its encoding and pass it to the vignette
engine, so the default argument value may not be native.enc.
Lastly, the most important piece of information is missing in this
post: library(rmarkdown); sessionInfo(). There is not a minimal
reproducible example, either. Without these information, I can only
guess blindly.
BTW, you may also try HTML vignettes instead, which is much much
easier to get right than LaTeX in terms of character encodings.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
Dear Duncan, The UTF-8 characters aren't properly rendered in the pdf version of the vignette. $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
That looks as though the UTF-8 characters are being interpreted as Latin1 characters (or whatever your native encoding is on Windows) when read from the file. It is quite tricky to work with UTF-8 in R in Windows. I think Sweave does it properly, though there may be exceptions. The issue is that many character input routines assume characters start out in the native encoding. (There's also a translation that happens by default on output, but I don't think that's your problem.) So the way to debug this is to follow all of the I/O, and see where the misinterpretation happens. For vignettes, things are complicated, because R reads the file to determine which vignette engine to use, then the vignette engine reads it (perhaps more than once).
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
It sounds as though the render function needs a way to determine the
encoding from the file itself. Recent Sweave versions support the
declaration
%\VignetteEncoding{utf8}
as well as the older
\usepackage[utf8]{inputenc}
that you used. You might want to try that line as well. (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)
Duncan Murdoch
I'll post the question on an RStudio forum as well. Best regards, Thierry -----Oorspronkelijk bericht----- Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Verzonden: dinsdag 9 december 2014 11:04 Aan: ONKELINX, Thierry; r-devel at r-project.org Onderwerp: Re: [Rd] UTF8 markdown vignette On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all, I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
On 09/12/2014, 4:38 PM, ONKELINX, Thierry wrote:
Dear Yihui, I have created a reproducible example at https://github.com/ThierryO/utf8vignette The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. %\VignetteEncoding{UTF-8} did not solve the problem.
I've just taken a look at the sources, and that's only in R-devel, it never got backported to R-patched so it isn't in the release R. You would need to use %\SweaveUTF8 instead. (It was introduced in 3.1.0, and should be kept until at least 3.2.0, but \VignetteEncoding will be preferred in the long run. It should make it into 3.1.3 unless we drop the ball again.) Duncan Murdoch
I use rmarkdown_0.3.11 HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
________________________________________
Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name]
Verzonden: dinsdag 9 december 2014 17:13
Aan: ONKELINX, Thierry
CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik
Onderwerp: Re: [Rd] UTF8 markdown vignette
A few things to clarify:
1. You do not necessarily have to keep the \usepackage{} line if you
use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in
its LaTeX template.
2. Perhaps the vignette engine in R has done something clever to
convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8}
instead of %\VignetteEncoding{utf8} to make sure it is a valid
encoding name, e.g.
'utf8' %in% iconvlist()
[1] FALSE
'UTF-8' %in% iconvlist()
[1] TRUE
'UTF8' %in% iconvlist()
[1] TRUE
BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and
I think it needs to be documented, since the old approach
\usepackage[enc]{inputenc} was basically a hack, which looks really
odd in non-LaTeX vignettes (e.g. HTML vignettes).
3. The default `encoding` argument of rmarkdown::render() is not
relevant here, even if its value is native.enc. When R build a
vignette, it tries to detect its encoding and pass it to the vignette
engine, so the default argument value may not be native.enc.
Lastly, the most important piece of information is missing in this
post: library(rmarkdown); sessionInfo(). There is not a minimal
reproducible example, either. Without these information, I can only
guess blindly.
BTW, you may also try HTML vignettes instead, which is much much
easier to get right than LaTeX in terms of character encodings.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:
Dear Duncan,
The UTF-8 characters aren't properly rendered in the pdf version of the vignette.
$?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????
That looks as though the UTF-8 characters are being interpreted as
Latin1 characters (or whatever your native encoding is on Windows) when
read from the file.
It is quite tricky to work with UTF-8 in R in Windows. I think Sweave
does it properly, though there may be exceptions. The issue is that
many character input routines assume characters start out in the native
encoding. (There's also a translation that happens by default on
output, but I don't think that's your problem.) So the way to debug
this is to follow all of the I/O, and see where the misinterpretation
happens. For vignettes, things are complicated, because R reads the
file to determine which vignette engine to use, then the vignette engine
reads it (perhaps more than once).
The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is
encoding = getOption("encoding"), which is "native.enc" on my system.
It sounds as though the render function needs a way to determine the
encoding from the file itself. Recent Sweave versions support the
declaration
%\VignetteEncoding{utf8}
as well as the older
\usepackage[utf8]{inputenc}
that you used. You might want to try that line as well. (You need to
keep the \usepackage line to tell LaTeX what encoding you're using.)
Duncan Murdoch
I'll post the question on an RStudio forum as well.
Best regards,
Thierry
-----Oorspronkelijk bericht-----
Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
Verzonden: dinsdag 9 december 2014 11:04
Aan: ONKELINX, Thierry; r-devel at r-project.org
Onderwerp: Re: [Rd] UTF8 markdown vignette
On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:
Dear all,
I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.
You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum.
Duncan Murdoch
Best regards,
Thierry
Details:
Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091.
The source packages is build using the devtools package. The build
command is R --vanilla CMD build "myPackage" --no-manual
--no-resave-data
The DESCRIPTION file has
VignetteBuilder: knitr
Suggests: knitr
Imports: rmarkdown
The markdown vignette YAML contains
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{The title}
\usepackage[utf8]{inputenc}
The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag.
The vignette in tar.gz passes R --vanilla CMD check --timings
--as-cran
* checking files in 'vignettes' ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking running R code from vignettes ...
'markdown_intro.Rmd' using 'UTF-8' ... OK OK
* checking re-building of vignette outputs ... [22s] OK
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything? To Thierry: I explicitly asked for library(rmarkdown);sessionInfo(), but you only told me the version of rmarkdown, which is not the only thing I was asking for. It is extremely important in this case to know the versions of other packages as well as your system locale information. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name
On Tue, Dec 9, 2014 at 3:42 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 4:38 PM, ONKELINX, Thierry wrote:
Dear Yihui, I have created a reproducible example at https://github.com/ThierryO/utf8vignette The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. %\VignetteEncoding{UTF-8} did not solve the problem.
I've just taken a look at the sources, and that's only in R-devel, it never got backported to R-patched so it isn't in the release R. You would need to use %\SweaveUTF8 instead. (It was introduced in 3.1.0, and should be kept until at least 3.2.0, but \VignetteEncoding will be preferred in the long run. It should make it into 3.1.3 unless we drop the ball again.) Duncan Murdoch
I use rmarkdown_0.3.11 HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. Best regards, Thierry
________________________________________ Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name] Verzonden: dinsdag 9 december 2014 17:13 Aan: ONKELINX, Thierry CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly.
Dear Yihui and Duncan,
Using \%SweaveUTF8 does not solve the problem. The nice thing about it, it that it no longer requires \usepackage[utf8]{} to make R CMD check --as-cran happy.
To Yihui: here is the sessionInfo().
library(rmarkdown) sessionInfo()
R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rmarkdown_0.3.11 loaded via a namespace (and not attached): [1] digest_0.6.4 fortunes_1.5-2 htmltools_0.2.6 tools_3.1.2 Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: xieyihui at gmail.com [mailto:xieyihui at gmail.com] Namens Yihui Xie Verzonden: woensdag 10 december 2014 4:37 Aan: Duncan Murdoch CC: ONKELINX, Thierry; r-devel at r-project.org; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything? To Thierry: I explicitly asked for library(rmarkdown);sessionInfo(), but you only told me the version of rmarkdown, which is not the only thing I was asking for. It is extremely important in this case to know the versions of other packages as well as your system locale information. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name
On Tue, Dec 9, 2014 at 3:42 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 4:38 PM, ONKELINX, Thierry wrote:
Dear Yihui, I have created a reproducible example at https://github.com/ThierryO/utf8vignette The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. %\VignetteEncoding{UTF-8} did not solve the problem.
I've just taken a look at the sources, and that's only in R-devel, it never got backported to R-patched so it isn't in the release R. You would need to use %\SweaveUTF8 instead. (It was introduced in 3.1.0, and should be kept until at least 3.2.0, but \VignetteEncoding will be preferred in the long run. It should make it into 3.1.3 unless we drop the ball again.) Duncan Murdoch
I use rmarkdown_0.3.11 HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. Best regards, Thierry
________________________________________ Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name] Verzonden: dinsdag 9 december 2014 17:13 Aan: ONKELINX, Thierry CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly.
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
On 09/12/2014, 10:36 PM, Yihui Xie wrote:
I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything?
I think it's actually a little messier than that: sometimes the encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD check), but not always. Here's what I think should happen instead: When building a vignette in a package, R knows the encoding declared for the package, so it should assume this as the default for the vignette. If nothing is declared, it should assume "native.enc", i.e. whatever is the native encoding on the machine it's running on. For each vignette, at the same time as it determines the vignette engine, it should see whether there is a declared encoding within the vignette. When it calls the engine, it should pass an encoding (and it should be a legal one, e.g. UTF-8, not utf8). Unless I notice something missing when I do this, or someone else tells me something that's missing, I'll try to make the changes above in R-devel and R-patched sometime before 3.1.3 is released. In the meantime, unless declaring a dependence on R >= 3.1.3, vignette engines should determine the encoding themselves whenever they are called without an "encoding" argument. Duncan Murdoch
To Thierry: I explicitly asked for library(rmarkdown);sessionInfo(), but you only told me the version of rmarkdown, which is not the only thing I was asking for. It is extremely important in this case to know the versions of other packages as well as your system locale information. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Tue, Dec 9, 2014 at 3:42 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 4:38 PM, ONKELINX, Thierry wrote:
Dear Yihui, I have created a reproducible example at https://github.com/ThierryO/utf8vignette The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. %\VignetteEncoding{UTF-8} did not solve the problem.
I've just taken a look at the sources, and that's only in R-devel, it never got backported to R-patched so it isn't in the release R. You would need to use %\SweaveUTF8 instead. (It was introduced in 3.1.0, and should be kept until at least 3.2.0, but \VignetteEncoding will be preferred in the long run. It should make it into 3.1.3 unless we drop the ball again.) Duncan Murdoch
I use rmarkdown_0.3.11 HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. Best regards, Thierry
________________________________________ Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name] Verzonden: dinsdag 9 december 2014 17:13 Aan: ONKELINX, Thierry CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly.
7 days later
For the record, I saw a change had been made in R-devel: https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan) Meanwhile, I also made a change in knitr to assume UTF-8 unless R passes an encoding to the vignette engine: https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the original problem, but apparently the former one is the ideal fix. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 10:36 PM, Yihui Xie wrote:
I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything?
I think it's actually a little messier than that: sometimes the encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD check), but not always. Here's what I think should happen instead: When building a vignette in a package, R knows the encoding declared for the package, so it should assume this as the default for the vignette. If nothing is declared, it should assume "native.enc", i.e. whatever is the native encoding on the machine it's running on. For each vignette, at the same time as it determines the vignette engine, it should see whether there is a declared encoding within the vignette. When it calls the engine, it should pass an encoding (and it should be a legal one, e.g. UTF-8, not utf8). Unless I notice something missing when I do this, or someone else tells me something that's missing, I'll try to make the changes above in R-devel and R-patched sometime before 3.1.3 is released. In the meantime, unless declaring a dependence on R >= 3.1.3, vignette engines should determine the encoding themselves whenever they are called without an "encoding" argument. Duncan Murdoch
1 day later
On 18/12/2014, 12:17 AM, Yihui Xie wrote:
For the record, I saw a change had been made in R-devel: https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan) Meanwhile, I also made a change in knitr to assume UTF-8 unless R passes an encoding to the vignette engine: https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the original problem, but apparently the former one is the ideal fix.
The Windows builds of R-devel were stalled for a few days, but I've given them a kick now, so this should appear in the Windows binaries on CRAN soon. Duncan Murdoch
Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 10:36 PM, Yihui Xie wrote:
I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything?
I think it's actually a little messier than that: sometimes the encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD check), but not always. Here's what I think should happen instead: When building a vignette in a package, R knows the encoding declared for the package, so it should assume this as the default for the vignette. If nothing is declared, it should assume "native.enc", i.e. whatever is the native encoding on the machine it's running on. For each vignette, at the same time as it determines the vignette engine, it should see whether there is a declared encoding within the vignette. When it calls the engine, it should pass an encoding (and it should be a legal one, e.g. UTF-8, not utf8). Unless I notice something missing when I do this, or someone else tells me something that's missing, I'll try to make the changes above in R-devel and R-patched sometime before 3.1.3 is released. In the meantime, unless declaring a dependence on R >= 3.1.3, vignette engines should determine the encoding themselves whenever they are called without an "encoding" argument. Duncan Murdoch
Dear Duncan and Yihui, I was able to test it with the new R-devel version. Adding only %\SweaveUTF8 to the vignette works (= passes R CMD CHECK --as-cran and UTF-8 characters render as they should). Adding only Encoding: UTF-8 to the DESCRIPTION instead of %\SweaveUTF8 works too. I have tested the same things with the github version of knitr on R-3.1.2-patched. Adding Encoding: UTF-8 to the DESCRIPTION gives an R CMD check --as-cran warning: * checking package vignettes in 'inst/doc' ... WARNING Non-ASCII package vignette without specified encoding: 'utf8vignette.Rmd' The UTF-8 characters in the vignette are none the less rendered correctly. Adding only \%SweaveUTF8 to the vignette makes it passing R CMD Check --as-cran and the UTF-8 characters are rendered correctly. So both the changes to R-devel and knitr seems to work fine. Thanks a lot. Thierry PS I've added the sessionInfo() of both configurations. #sessionInfo() of R-devel
library(rmarkdown) library(knitr) sessionInfo()
R Under development (unstable) (2014-12-18 r67185) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] knitr_1.8 rmarkdown_0.3.11 loaded via a namespace (and not attached): [1] digest_0.6.4 evaluate_0.5.5 formatR_1.0 htmltools_0.2.6 stringr_0.6.2 [6] tools_3.2.0
library(knitr) library(rmarkdown) sessionInfo()
R version 3.1.2 Patched (2014-12-11 r67166) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rmarkdown_0.3.11 knitr_1.8.6 loaded via a namespace (and not attached): [1] bitops_1.0-6 devtools_1.6.1 digest_0.6.6 evaluate_0.5.5 formatR_1.0 [6] htmltools_0.2.6 httr_0.6.0 RCurl_1.95-4.5 stringr_0.6.2 tools_3.1.2 ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ________________________________________ Van: Duncan Murdoch [murdoch.duncan at gmail.com] Verzonden: vrijdag 19 december 2014 14:02 Aan: Yihui Xie CC: ONKELINX, Thierry; r-devel at r-project.org; Kurt Hornik Onderwerp: Re: [Rd] UTF8 markdown vignette
On 18/12/2014, 12:17 AM, Yihui Xie wrote:
For the record, I saw a change had been made in R-devel: https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan) Meanwhile, I also made a change in knitr to assume UTF-8 unless R passes an encoding to the vignette engine: https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the original problem, but apparently the former one is the ideal fix.
The Windows builds of R-devel were stalled for a few days, but I've given them a kick now, so this should appear in the Windows binaries on CRAN soon. Duncan Murdoch
Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2014, 10:36 PM, Yihui Xie wrote:
I took a look at the R source and I realized that the encoding was actually never passed to the vignette engine: https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 Apparently only the file and quiet arguments are passed to the vignette engine. Did I miss anything?
I think it's actually a little messier than that: sometimes the encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD check), but not always. Here's what I think should happen instead: When building a vignette in a package, R knows the encoding declared for the package, so it should assume this as the default for the vignette. If nothing is declared, it should assume "native.enc", i.e. whatever is the native encoding on the machine it's running on. For each vignette, at the same time as it determines the vignette engine, it should see whether there is a declared encoding within the vignette. When it calls the engine, it should pass an encoding (and it should be a legal one, e.g. UTF-8, not utf8). Unless I notice something missing when I do this, or someone else tells me something that's missing, I'll try to make the changes above in R-devel and R-patched sometime before 3.1.3 is released. In the meantime, unless declaring a dependence on R >= 3.1.3, vignette engines should determine the encoding themselves whenever they are called without an "encoding" argument. Duncan Murdoch
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>