Skip to content

[R-pkg-devel] handling of byte-order-mark on r-devel-linux-x86_64-debian-clang machine

4 messages · Jeff Newmiller, Tomas Kalibera, Ivan Krylov

#
On 3/28/22 13:16, Ivan Krylov wrote:
Thanks for the suggestions, I've rewritten the paragraphs, biasing 
towards users who have UTF-8 as the native encoding as this is going to 
be the majority. These users should not have to worry much about the 
encoding marks anymore, nor about the internal UTF-8 mode of the 
connections code. But the level of detail I think needs to remain as 
long as these features are supported - the level of detail is based on 
numerous questions and bug reports.

Best
Tomas
#
Thanks to the ubiquity of Excel and its misguided inclusion of BOM codes in its UTF-8 CSV format, this optimism about encoding being a corner case seems premature. There are actually multiple options in Excel for writing CSV files, and only one of them (not the first one fortunately) has this "feature", but I (and various beginners I end up helping) seem to encounter these silly files far more frequently than seems reasonable.
On April 5, 2022 11:20:37 AM PDT, Tomas Kalibera <tomas.kalibera at gmail.com> wrote:

  
    
#
On 4/6/22 00:02, Jeff Newmiller wrote:
I was rather referring to encoding marks in R which declare an encoding 
of an R string, that is what you see by Encoding(). And to other 
measures to avoid the problem when the native encoding in R cannot 
represent all characters users need to work with (when the native 
encoding cannot be UTF-8). From R 4.2, the native encoding will be UTF-8 
also on (recent) Windows systems; on most Unix systems, it has been 
UTF-8 for years. But this change will not impact the handling of BOMs in 
input.

Is the problem reading CSV files from Excel (even when Excel is at 
fault) reported anywhere? If not, please report, maybe there is 
something that could be done to help processing those files on the R 
side. R handles BOMs in the "connections" code, ?connections, and it 
uses iconv for input conversion.

Thanks
Tomas
#
On Tue, 5 Apr 2022 20:20:37 +0200
Tomas Kalibera <tomas.kalibera at gmail.com> wrote:

            
Thank you!
Of course, all these features have their use cases and it's important
to stay backwards compatible, including the documentation.

I would also like to apologise to Dan for leading him on a wild goose
chase that didn't bring him passing read.csv-related tests in the end.