Message-ID: <5FBCD1CB-E85C-4D2A-882B-CB81A3A5C6D3@me.com>
Date: 2015-04-20T14:08:59Z
From: Marc Schwartz
Subject: regexpr - ignore all special characters and punctuation in a string
In-Reply-To: <CAN2xGJb=n+jVyQGih_mgrjiHXEBiDtm0xeCUEDrVTrzhYu9w+g@mail.gmail.com>
> On Apr 20, 2015, at 8:59 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:
>
> Hello!
>
> Please point me in the right direction.
> I need to match 2 strings, but focusing ONLY on characters, ignoring
> all special characters and punctuation signs, including (), "", etc..
>
> For example:
> I want the following to return: TRUE
>
> "What a nice day today! - Story of happiness: Part 2." ==
> "What a nice day today: Story of happiness (Part 2)"
>
>
> --
> Thank you!
> Dimitri Liakhovitski
Look at ?agrep:
Vec1 <- "What a nice day today! - Story of happiness: Part 2."
Vec2 <- "What a nice day today: Story of happiness (Part 2)?
# Match the words, not the punctuation.
# Not fully tested
> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2))
[1] 1 2
> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2),
value = TRUE)
[1] "What a nice day today! - Story of happiness: Part 2."
[2] "What a nice day today: Story of happiness (Part 2)?
Also, possibly:
http://cran.r-project.org/web/packages/stringdist
Regards,
Marc Schwartz