Hi
I think the following behavior is a regression from R 3.2.5:
> match(iconv( c("\u00f8", "A"), from = "UTF8", to = "latin1" ),
"\u00f8")
[1] 1 NA
> match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8")
[1] NA
> match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8",
incomparables = NA)
[1] 1
I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.
The specific behavior makes me think this is related to the following
NEWS entry:
match(x, table) is faster (sometimes by an order of magnitude) when x is
of length one and incomparables is unchanged (PR#16491).
Best regards
Kirill
Regression in match() in R 3.3.0 when matching strings with different character encodings
3 messages · Kirill Müller, Peter Haverty, Martin Maechler
Dear Kirill, You are correct, that is a new bug introduced in PR16491. The appropriate fix and regression tests have been added via PR16885, which has been merged into trunk. I believe that means the fix will be released with R 3.3.1. I checked your example and the second "match" now properly returns 1 with the patched code. Please have a look at https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885 http://developer.r-project.org/blosxom.cgi/R-devel/NEWS Thank you for your report. I hope the benefits of this speedup will eventually outweigh this unfortunate bug in my PR16491. Regards, Pete ____________________ Peter M. Haverty, Ph.D.
Peter Haverty <haverty.peter at gene.com>
on Mon, 9 May 2016 09:47:48 -0700 writes:
> Dear Kirill,
> You are correct, that is a new bug introduced in PR16491. The appropriate
> fix and regression tests have been added via PR16885, which has been merged
> into trunk. I believe that means the fix will be released with R 3.3.1.
Yes, definitely.
Kirill, as seem to use code which does trigger the bug, you may want to
switch using 'R-patched', i.e.,
> R.version.string
[1] "R version 3.3.0 Patched (2016-05-09 r70591)"
( where the subversion revision must be >= 70591 )
> I checked your example and the second "match" now properly returns 1 with
> the patched code.
> Please have a look at
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885
> http://developer.r-project.org/blosxom.cgi/R-devel/NEWS
> Thank you for your report. I hope the benefits of this speedup will
> eventually outweigh this unfortunate bug in my PR16491.
I'm pretty sure that your hope will be fulfilled.
> Regards,
> Pete
> ____________________
> Peter M. Haverty, Ph.D.
Martin Maechler, ETH Zurich