Skip to content

sub and gsub treat \\ incorrectly (PR#13454)

4 messages · amiransk at uwo.ca, William Dunlap, Andriy Miranskyy +1 more

#
Sub and gsub treat \\ replacement pattern incorrectly

I expect
  sub("a","\\", "a", perl=T)
to produce
  [1] "\"
instead it generates
  [1] ""

On the other hand, if I run
  sub("a","\\\\", "a", perl=T)
it correctly outputs
  [1] "\\"

The same issue applies to gsub.

--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 2
 minor = 8.1
 year = 2008
 month = 12
 day = 22
 svn rev = 47281
 language = R
 version.string = R version 2.8.1 (2008-12-22)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base
#
The replacement pattern may include \\digit, which means
to put the digit'th parenthesized subexpression into the
replacement.  E.g.
   > sub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five")
   [1] "two One three four five"
   > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five")
   [1] "two One four three five"
To support this without ambiguity or surprises, \\ is expected
to be followed by a digit (or L or U when perl=TRUE).

When fixed=TRUE then there is no possibility of a parenthesized
subexpression so \\2 is taken literally.

help(gsub) is not explicit about this behavior.

Because I initially made the same mistake, when I wrote the S+
versions of gsub and sub I included a warning when the replacement
included a \\ not followed by a digit:

  > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\ \\", "One two three four
five")
  [1] "    five"
  Warning messages:
    backslash in replacement argument of substituteString(fixed=F) is
not
          followed by backslash or digit, hence backslash is omitted in:
substit\
          uteString(pattern = pattern, replacement = replacement, x = x,
extended ....
#
Thank you, William! This makes things clearer.

I am trying to create a tiny converter of free text to Tex format. In
order to do that I need to replace all "_" with "\_" and all "&"
with "\&". Could you please tell me, is there a way of doing it using
gsub?

Regards,
Andriy
Monday, January 19, 2009, 6:24:56 PM, you wrote:

            
>> gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\ \\", "One two three four
#
On Tue, 20 Jan 2009, Andriy Miranskyy wrote:

            
Yes.  gsub("([_&])", "\\\\\\1", x) -- but you will also need to escape 
$ and %.  There are examples in the experimental Rd2tex function in 
the development version of R.

Please stop misusing the bug repository to ask questions (and to 
report your non-reading of the documentation: this is in ?regex 
referenced from ?sub, 'explicitly', pace Bill).