sub and gsub treat \\ incorrectly (PR#13454)
On Tue, 20 Jan 2009, Andriy Miranskyy wrote:
Thank you, William! This makes things clearer. I am trying to create a tiny converter of free text to Tex format. In order to do that I need to replace all "_" with "\_" and all "&" with "\&". Could you please tell me, is there a way of doing it using gsub?
Yes. gsub("([_&])", "\\\\\\1", x) -- but you will also need to escape
$ and %. There are examples in the experimental Rd2tex function in
the development version of R.
Please stop misusing the bug repository to ask questions (and to
report your non-reading of the documentation: this is in ?regex
referenced from ?sub, 'explicitly', pace Bill).
Regards, Andriy Monday, January 19, 2009, 6:24:56 PM, you wrote:
-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org] On Behalf Of amiransk at uwo.ca
Sent: Monday, January 19, 2009 10:25 AM
To: r-devel at stat.math.ethz.ch
Cc: R-bugs at r-project.org
Subject: [Rd] sub and gsub treat \\ incorrectly (PR#13454)
Sub and gsub treat \\ replacement pattern incorrectly
I expect
sub("a","\\", "a", perl=T)
to produce
[1] "\"
instead it generates
[1] ""
On the other hand, if I run
sub("a","\\\\", "a", perl=T)
it correctly outputs
[1] "\\"
The replacement pattern may include \\digit, which means to put the digit'th parenthesized subexpression into the replacement. E.g.
> sub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five") [1] "two One three four five"
> gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five") [1] "two One four three five" To support this without ambiguity or surprises, \\ is expected to be followed by a digit (or L or U when perl=TRUE).
When fixed=TRUE then there is no possibility of a parenthesized subexpression so \\2 is taken literally.
help(gsub) is not explicit about this behavior.
Because I initially made the same mistake, when I wrote the S+ versions of gsub and sub I included a warning when the replacement included a \\ not followed by a digit:
>> gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\ \\", "One two three four
five")
[1] " five"
Warning messages:
backslash in replacement argument of substituteString(fixed=F) is
not
followed by backslash or digit, hence backslash is omitted in:
substit\
uteString(pattern = pattern, replacement = replacement, x = x,
extended ....
The same issue applies to gsub. --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 2 minor = 8.1 year = 2008 month = 12 day = 22 svn rev = 47281 language = R version.string = R version 2.8.1 (2008-12-22) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base -- Sincerely, Andriy
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595