Skip to content
Prev 53159 / 63421 Next

[PATCH] Improve utf8clen and remove utf8_table4

Some of the code that uses utf8clen checks the validity of the utf8 
string before making the call.
However, there were some hairy areas where I felt that the new semantics 
may cause issues (if not now, then in future changes).

I've attached two patches:
     * new_semantics.diff keeps the new semantics and updates those 
hairy areas above.
     * old_semantics.diff maintains the old semantics (return 1 even for 
continuation bytes).

I don't think the new semantics will cause issues, especially with the 
updates, but we can err on the side of caution and keep the old 
semantics. I feel that the new semantics provide a clearer interface 
though (the function expects a start byte and should return an error if 
a start byte is not supplied).
In either case, the utf8_table4 array has been removed.

Sahil
On 03/19/2017 05:38 AM, Duncan Murdoch wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: old_semantics.diff
Type: text/x-patch
Size: 1707 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170319/477b61e9/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_semantics.diff
Type: text/x-patch
Size: 6332 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170319/477b61e9/attachment-0001.bin>