Skip to content
Prev 14769 / 15075 Next

Bug in reading UTF-16LE file?

That was not clear (to me?) in your previous summary. Thanks for clarifying.
The Unicode FAQ does. If you specify endian-ness and a BOM is present and these specifications agree then it would seem no harm no foul. The problem is that if they conflict, then there is no clearly correct behavior: if the BOM is valid then the user spec must be incorrectly specified and favoring the user specification forces incorrect decoding. If the BOM is erroneous, then you would want the user to be able to override the incorrect BOM... but these two cases amount to defeating the BOMs purpose... it might as well not be there. So the compliant handling of data with a BOM is for the user to make a standard practice of not specifying endianness _unless they must override an invalid BOM_ (which ought to be highly unusual)... save the sledgehammer for unusual cases, and let the BOM be the "only" specification if it is present. This lets the BOM serve its intended purpose of reducing how often users have to guess.
On October 1, 2024 1:50:25 PM MST, Tomas Kalibera <tomas.kalibera at gmail.com> wrote: