regexpr - ignore all special characters and punctuation in a string
On 20/04/2015 9:59 AM, Dimitri Liakhovitski wrote:
Hello! Please point me in the right direction. I need to match 2 strings, but focusing ONLY on characters, ignoring all special characters and punctuation signs, including (), "", etc.. For example: I want the following to return: TRUE "What a nice day today! - Story of happiness: Part 2." == "What a nice day today: Story of happiness (Part 2)"
I would transform both strings using gsub(), then compare.
e.g.
clean <- function(s)
gsub("[[:punct:][:blank:]]", "", s)
clean("What a nice day today! - Story of happiness: Part 2.") ==
clean("What a nice day today: Story of happiness (Part 2)")
This completely ignores spaces; you might want something more
sophisticated if you consider "today" and "to day" to be different, e.g.
clean <- function(s) {
s <- gsub("[[:punct:]]", "", s)
gsub("[[:blank:]]+", " ", s)
}
which converts multiple blanks into single spaces.
Duncan Murdoch