I have a problem with the str_replace() function in the stringr package. Please refer to my reprex below. I start with a vector of strings, called x. Some of the strings contain apostrophes and brackets. I make a simple replacement as with x1, and there is no problem. I make another simple replacement, x2, where the pattern string has an apostrophe. Again no problem. Then I make a third replacement, x3, where the pattern has opening and closing brackets and the function still works fine. Finally I make a replacement where the pattern has both an apostrophe and opening and closing brackets and the replacement does not work. I tried to solve this by putting backslashes before the apostrophe and/or the brackets, but that accomplished nothing. I am stumped. # Reprex for str_replace problem library(stringr) x <- c( "Clothing and footwear", "Women's clothing", "Women's footwear (excluding athletic)", "Clothing accessories (belts and so on)", "Clothing and footwear", "Women's clothing", "Women's footwear (excluding athletic)", "Clothing accessories (belts and so on)" ) x x1 <- str_replace(x, "Clothing and footwear", "Clothing and shoes" ) x1 x2 <- str_replace(x, "Women's clothing", "Women's clothing goods" ) x2 x3 <- str_replace(x, "Clothing accessories (belts and so on)", "Clothing accessories") x3 x4 <- str_replace(x, "Women's footwear (excluding athletic)", "Women's footwear") x4
Problem with the str_replace function
4 messages · Bert Gunter, Hervé Pagès, phii m@iii@g oii phiiipsmith@c@
I prefer using regular expressions directly, so this may not satisfy you:
a <-"Women's footwear (excluding athletic)"
b <- gsub("(.*) \\(.*$","\\1",a)
b
[1] "Women's footwear" There are, of course other ways to do this with regex's or even substring() Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 16, 2021 at 5:36 PM <phil at philipsmith.ca> wrote:
I have a problem with the str_replace() function in the stringr package. Please refer to my reprex below. I start with a vector of strings, called x. Some of the strings contain apostrophes and brackets. I make a simple replacement as with x1, and there is no problem. I make another simple replacement, x2, where the pattern string has an apostrophe. Again no problem. Then I make a third replacement, x3, where the pattern has opening and closing brackets and the function still works fine. Finally I make a replacement where the pattern has both an apostrophe and opening and closing brackets and the replacement does not work. I tried to solve this by putting backslashes before the apostrophe and/or the brackets, but that accomplished nothing. I am stumped. # Reprex for str_replace problem library(stringr) x <- c( "Clothing and footwear", "Women's clothing", "Women's footwear (excluding athletic)", "Clothing accessories (belts and so on)", "Clothing and footwear", "Women's clothing", "Women's footwear (excluding athletic)", "Clothing accessories (belts and so on)" ) x x1 <- str_replace(x, "Clothing and footwear", "Clothing and shoes" ) x1 x2 <- str_replace(x, "Women's clothing", "Women's clothing goods" ) x2 x3 <- str_replace(x, "Clothing accessories (belts and so on)", "Clothing accessories") x3 x4 <- str_replace(x, "Women's footwear (excluding athletic)", "Women's footwear") x4
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi,
stringr::str_replace() treats the 2nd argument ('pattern') as a regular
expression and some characters have a special meaning when they are used
in a regular expression. For example the dot plays the role of a
wildcard (i.e. it means "any character"):
> str_replace("aaXcc", "a.c", "ZZ")
[1] "aZZc"
If you want to treat a special character literally, you need to escape
it with a double backslahe '\\':
> str_replace(c("aaXcc", "aa.cc"), "a.c", "ZZ")
[1] "aZZc" "aZZc"
> str_replace(c("aaXcc", "aa.cc"), "a\\.c", "ZZ")
[1] "aaXcc" "aZZc"
Turns out that parenthesis are also special characters so you also need
to escape them:
> str_replace("aa(X)cc", "a(X)c", "ZZ")
[1] "aa(X)cc"
> str_replace("aa(X)cc", "a\\(X\\)c", "ZZ")
[1] "aZZc"
There are plenty of example in the man page for str_replace() (see
'?str_replace') including examples showing the use of parenthesis in the
pattern.
Hope this helps,
H.
On 3/16/21 5:34 PM, phil at philipsmith.ca wrote:
I have a problem with the str_replace() function in the stringr package. Please refer to my reprex below. I start with a vector of strings, called x. Some of the strings contain apostrophes and brackets. I make a simple replacement as with x1, and there is no problem. I make another simple replacement, x2, where the pattern string has an apostrophe. Again no problem. Then I make a third replacement, x3, where the pattern has opening and closing brackets and the function still works fine. Finally I make a replacement where the pattern has both an apostrophe and opening and closing brackets and the replacement does not work. I tried to solve this by putting backslashes before the apostrophe and/or the brackets, but that accomplished nothing. I am stumped. # Reprex for str_replace problem library(stringr) x <- c( ? "Clothing and footwear", ? "Women's clothing", ? "Women's footwear (excluding athletic)", ? "Clothing accessories (belts and so on)", ? "Clothing and footwear", ? "Women's clothing", ? "Women's footwear (excluding athletic)", ? "Clothing accessories (belts and so on)" ) x x1 <- str_replace(x, ? "Clothing and footwear", ? "Clothing and shoes" ) x1 x2 <- str_replace(x, ? "Women's clothing", ? "Women's clothing goods" ) x2 x3 <- str_replace(x, ? "Clothing accessories (belts and so on)", ? "Clothing accessories") x3 x4 <- str_replace(x, ? "Women's footwear (excluding athletic)", ? "Women's footwear") x4
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Herv? Pag?s Bioconductor Core Team hpages.on.github at gmail.com
Your help is much appreciated. I now understand what my problem was and can move forward. Philip
On 2021-03-17 01:19, Herv? Pag?s wrote:
Hi,
stringr::str_replace() treats the 2nd argument ('pattern') as a
regular expression and some characters have a special meaning when
they are used in a regular expression. For example the dot plays the
role of a wildcard (i.e. it means "any character"):
> str_replace("aaXcc", "a.c", "ZZ")
[1] "aZZc" If you want to treat a special character literally, you need to escape it with a double backslahe '\\':
> str_replace(c("aaXcc", "aa.cc"), "a.c", "ZZ")
[1] "aZZc" "aZZc"
> str_replace(c("aaXcc", "aa.cc"), "a\\.c", "ZZ")
[1] "aaXcc" "aZZc" Turns out that parenthesis are also special characters so you also need to escape them:
> str_replace("aa(X)cc", "a(X)c", "ZZ")
[1] "aa(X)cc"
> str_replace("aa(X)cc", "a\\(X\\)c", "ZZ")
[1] "aZZc" There are plenty of example in the man page for str_replace() (see '?str_replace') including examples showing the use of parenthesis in the pattern. Hope this helps, H. On 3/16/21 5:34 PM, phil at philipsmith.ca wrote:
I have a problem with the str_replace() function in the stringr package. Please refer to my reprex below. I start with a vector of strings, called x. Some of the strings contain apostrophes and brackets. I make a simple replacement as with x1, and there is no problem. I make another simple replacement, x2, where the pattern string has an apostrophe. Again no problem. Then I make a third replacement, x3, where the pattern has opening and closing brackets and the function still works fine. Finally I make a replacement where the pattern has both an apostrophe and opening and closing brackets and the replacement does not work. I tried to solve this by putting backslashes before the apostrophe and/or the brackets, but that accomplished nothing. I am stumped. # Reprex for str_replace problem library(stringr) x <- c( ? "Clothing and footwear", ? "Women's clothing", ? "Women's footwear (excluding athletic)", ? "Clothing accessories (belts and so on)", ? "Clothing and footwear", ? "Women's clothing", ? "Women's footwear (excluding athletic)", ? "Clothing accessories (belts and so on)" ) x x1 <- str_replace(x, ? "Clothing and footwear", ? "Clothing and shoes" ) x1 x2 <- str_replace(x, ? "Women's clothing", ? "Women's clothing goods" ) x2 x3 <- str_replace(x, ? "Clothing accessories (belts and so on)", ? "Clothing accessories") x3 x4 <- str_replace(x, ? "Women's footwear (excluding athletic)", ? "Women's footwear") x4
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.