Hi Yihui,
list.files() returns file names converted to native encoding by Windows,
so one needs to use only characters representable in current native
encoding for file names. If one wants to be safe, it makes sense to be
much stricter than that (only ASCII, and only a subset of it, there is a
number of recommendations that can be found online). Using more than
that is asking for trouble.
Unicode "\u00e4" is a Latin-1 character, so representable in CP1252. On
my Windows running in CP1252 as C locale and system code page, your
example works fine, file.exists() returns TRUE, and this is the expected
behavior (tested in R-devel and R4.0).
Your example was run in CP1252 as C locale but CP936 as the system code
page (see the sessionInfo() output). On Windows, unfortunately, there
are two different "current locales" at a time. With your settings
(CP1252 as C locale and CP936 as system code page), I get the same
results as you, file.exists() returns FALSE. enc2native(z) works fine
and returns a valid Latin-1 string, but that is because here "native" is
CP1252. Windows API functions and consequently some C library functions
that return strings from the OS, however, convert to the encoding from
the system code page, which is CP936 and it cannot represent "?". So,
currently the behavior you are reporting is expected for R 4.0 and
earlier. I don't think this is a regression, it couldn't have worked
before, either - and I've tested in 3.6.3 and 3.4.3 on my system.
These problems will go away when UTF-8 is both the current native
encoding for the C locale and the system code page. This is possible in
recent Windows 10, but requires UCRT and hence a new toolchain to build
R, and requires all packages and libraries to be rebuilt from source.
More details on my blog, also there is experimental build of R
(installer) and experimental toolchain available:
https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
Best
Tomas
On 6/22/20 6:11 AM, Yihui Xie wrote:
Hi Tomas,
I received a report about R 4.0.0 in the knitr package
(https://github.com/yihui/knitr/issues/1840), and I think it is
related to the issue here. I created a minimal reproducible example
below:
owd = setwd(tempdir())
z = 'K\u00e4sch.txt'
file.create(z)
list.files()
file.exists(list.files())
setwd(owd)
Output:
owd = setwd(tempdir())
z = 'K\u00e4sch.txt'
file.create(z)
file.exists(list.files())
I wonder if it is expected that file.exists() returns FALSE here.
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
system code page: 936
FWIW, I also tested Chinese characters in the variable `z` above, and
file.exists() returns TRUE only after I Sys.setlocale(, "Chinese").
Regards,
Yihui
On Thu, Jun 11, 2020 at 3:11 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:
Dear Juan,
I don't see what is the problem from your report. Please try to create a
minimal but complete reproducible example that does not use the renv
package. Perhaps you could use the R debugger (e.g. via
options(error=recover)) to find out what is the argument that
file.exists() has been called with. And then you could try just to call
file.exists() directly with that argument to trigger the problem.
It may be that the argument has been corrupted/is invalid in the current
native encoding. If that is the case, the next step would be to find out
who corrupted it (renv, R, something else). The error is displayed when
a path name cannot be converted from the current native encoding to
UTF16-LE.
The experimental support for UTF-8 as native encoding on Windows 10 is
only available in a custom build of R, like the one I linked from my
blog post.
Thanks
Tomas
On 6/10/20 1:06 PM, Juan Telleria Ruiz de Aguirre wrote:
Error in file.exists(children) :
file name conversion problem -- name too long?
14: file.exists(children)
13: renv_dependencies_find_dir_children(path, root)
12: renv_dependencies_find_dir(path, root)
11: FUN(X[[i]], ...)
10: lapply(path, renv_dependencies_find_impl, root = root)
9: renv_dependencies_find(path, root)
8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
errors = c("reported", "fatal", "ignored"), dev = FALSE)
{
path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
root <- root %||% renv_dependencies_root(path)
if (exists(path, envir = `_renv_dependencies`))
return(get(path, envir = `_renv_dependencies`))
renv_dependencies_begin(root = root)
on.exit(renv_dependencies_end(), add = TRUE)
dots <- list(...)
if (identical(dots[["quiet"]], TRUE)) {
progress <- FALSE
errors <- "ignored"
}
files <- renv_dependencies_find(path, root)
deps <- renv_dependencies_discover(files, progress, errors)
renv_dependencies_report(errors)
deps
})(path, progress = FALSE, errors = errors, dev = TRUE)
7: eval(call, envir = parent.frame(2))
6: eval(call, envir = parent.frame(2))
5: delegate(renv_dependencies_impl)
4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
dev = TRUE), renv.dependencies.error =
renv_dependencies_error_handler(message,
errors))
2: renv_dependencies_scope(project, action = "init")
1: renv::init()
Diagnostics Report -- renv [0.10.0]
===================================
# Session Info =======================
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] renv_0.10.0
loaded via a namespace (and not attached):
[1] compiler_4.0.1 rsconnect_0.8.16 htmltools_0.4.0 tools_4.0.1
[5] yaml_2.2.1 Rcpp_1.0.4.6 rmarkdown_2.2 knitr_1.28
[9] xfun_0.14 digest_0.6.25 packrat_0.5.0 rlang_0.4.6
[13] evaluate_0.14
# Project ============================
Project path: "~/Test2"
# Status =============================
# Lockfile ===========================
This project has not yet been snapshotted: 'renv.lock' does not exist.
# Library ============================
The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
does not exist.
# Dependencies =======================
# User Profile =======================
[no user profile detected]
# Settings ===========================
List of 6
$ external.libraries : chr(0)
$ ignored.packages : chr(0)
$ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
$ snapshot.type : chr "implicit"
$ use.cache : logi TRUE
$ vcs.ignore.library : logi TRUE
# Options ============================
List of 1
$ renv.verbose: logi TRUE
# Environment Variables ==============
HOME = C:\Users\J-tel\OneDrive\Documents
LANG = <NA>
R_LIBS = <NA>
R_LIBS_SITE = <NA>
R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0
# PATH ===============================
- C:\rtools40\usr\bin
- C:\Program Files\R\R-4.0.1\bin\x64
- C:\ProgramData\Miniconda3
- C:\ProgramData\Miniconda3\Library\mingw-w64\bin
- C:\ProgramData\Miniconda3\Library\usr\bin
- C:\ProgramData\Miniconda3\Library\bin
- C:\ProgramData\Miniconda3\Scripts
- C:\ProgramData\Oracle\Java\javapath
- C:\WINDOWS\system32
- C:\WINDOWS
- C:\WINDOWS\System32\Wbem
- C:\WINDOWS\System32\WindowsPowerShell\v1.0\
- C:\WINDOWS\System32\OpenSSH\
- C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
- C:\ProgramData\Miniconda3\Scripts\conda.exe
# Cache ==============================
There are a total of 0 package(s) installed in the renv cache.
Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"
System Information:
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "4"
$minor
[1] "0.1"
$year
[1] "2020"
$month
[1] "06"
$day
[1] "06"
$`svn rev`
[1] "78648"
$language
[1] "R"
$version.string
[1] "R version 4.0.1 (2020-06-06)"
$nickname
[1] "See Things Now"
Thank you,
Juan