Skip to content

list.files() not compatible with all Unicode characters; file.exists() is compatible.

4 messages · Nissim Kaufmann, MacQueen, Don, Brian Ripley

#
Hello,I have some files with strange Unicode characters in their names that I am trying to remove.But list.files() does not return their names faithfully so that I can deal with them.
Documentation for file.exists() and file.access() do not seem to discuss this.
The file name has these Unicode characters:BLACK RIGHT-POINTING TRIANGLE WITH DOUBLE VERTICAL BAR U+23EF BLACK RIGHT-POINTING TRIANGLE WITH DOUBLE VERTICAL BAR ?BLACK RIGHT-POINTING TRIANGLE U+25B6 BLACK RIGHT-POINTING TRIANGLE ?
Thank you!CheersNissim KaufmannNSOL.altervista.org
#
Sorry, your email was undecipherable because you sent HTML formatted email.
Please send plain text
#
On 25/11/2014 01:25, MacQueen, Don wrote:
Also, the 'at a minimum' information requested by the posting guide is 
essential here (which OS and locale, in particular).  In general file 
names not in the locale's encoding are unsupported.
1 day later
#
On 25/11/2014 06:53, Prof Brian Ripley wrote:
An off-list reply indicated this was Windows XP.  Although the message 
body was unreadable, the gist is in the subject line.

 From ?list.files under Windows

   path must specify paths which can be represented in the current
   codepage.

whereas ?file.exists says

   Most of these functions accept UTF-8 filepaths not valid in the
   current locale.

So this is documented behaviour.

[For anyone curious as to why list.files is different: note that it does 
regexp pattern matching.  Adding support for Unicode file paths would 
not be impossible but it would require hundreds of lines of Windows-only 
code.]