Skip to content
Prev 350190 / 398506 Next

regexpr - ignore all special characters and punctuation in a string

On 20/04/2015 9:59 AM, Dimitri Liakhovitski wrote:
I would transform both strings using gsub(), then compare.

e.g.

clean <- function(s)
  gsub("[[:punct:][:blank:]]", "", s)

clean("What a nice day today! - Story of happiness: Part 2.") ==
clean("What a nice day today: Story of happiness (Part 2)")

This completely ignores spaces; you might want something more
sophisticated if you consider "today" and "to day" to be different, e.g.

clean <- function(s) {
  s <- gsub("[[:punct:]]", "", s)
  gsub("[[:blank:]]+", " ", s)
}

which converts multiple blanks into single spaces.

Duncan Murdoch