Skip to content

Regression in match() in R 3.3.0 when matching strings with different character encodings

3 messages · Kirill Müller, Peter Haverty, Martin Maechler

#
Hi


I think the following behavior is a regression from R 3.2.5:

 > match(iconv(  c("\u00f8", "A"), from = "UTF8", to  = "latin1" ), 
"\u00f8")
[1]  1 NA
 > match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8")
[1] NA
 > match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8", 
incomparables = NA)
[1] 1

I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.

The specific behavior makes me think this is related to the following 
NEWS entry:

match(x, table) is faster (sometimes by an order of magnitude) when x is 
of length one and incomparables is unchanged (PR#16491).


Best regards

Kirill
#
Dear Kirill,

You are correct, that is a new bug introduced in PR16491. The appropriate
fix and regression tests have been added via PR16885, which has been merged
into trunk. I believe that means the fix will be released with R 3.3.1.

I checked your example and the second "match" now properly returns 1 with
the patched code.

Please have a look at
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885
http://developer.r-project.org/blosxom.cgi/R-devel/NEWS

Thank you for your report. I hope the benefits of this speedup will
eventually outweigh this unfortunate bug in my PR16491.

Regards,

Pete

____________________
Peter M. Haverty, Ph.D.
#
> Dear Kirill,
    > You are correct, that is a new bug introduced in PR16491. The appropriate
    > fix and regression tests have been added via PR16885, which has been merged
    > into trunk. I believe that means the fix will be released with R 3.3.1.

Yes, definitely.
Kirill, as seem to use code which does trigger the bug, you may want to
switch using 'R-patched', i.e., 

  > R.version.string
  [1] "R version 3.3.0 Patched (2016-05-09 r70591)"

   ( where the subversion revision must be >= 70591 )

    > I checked your example and the second "match" now properly returns 1 with
    > the patched code.

    > Please have a look at
    > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885
    > http://developer.r-project.org/blosxom.cgi/R-devel/NEWS

    > Thank you for your report. I hope the benefits of this speedup will
    > eventually outweigh this unfortunate bug in my PR16491.

I'm pretty sure that your hope will be fulfilled.

    > Regards,
    > Pete
    > ____________________
    > Peter M. Haverty, Ph.D.

Martin Maechler, ETH Zurich