Skip to content

library/function to compare two phrases?

4 messages · R. Michael Weylandt, David Winsemius, Brian Feeny

#
I am looking for a library/function in R that can compare two phrases and give me a score, or somehow classify them as correct as possible.

The "phrases" are obfuscated/messy.  I am not concerned about which is "correct" (for example spell checking), I am only concerned in grouping them
so that I know they are the closest match.

Example:

I have ROW1 and ROW2 like so:

ROW1							ROW2
hamburger helper				bigmc heartkcatta
chicken nuggets					chicke, nuggets, jss
bigmac heartattack				some sombody somehwere
somebody somehwere			repleh regrubmah

I am looking for something that can tell me that the best match for hamburger helper is repleh regrubmah, and the same for each other row.

So my goal is to write a program that foreach phrase in ROW1 runs this function against ROW2 and gives me the phrase that scored best.

I have read over much of the NLP packages at http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

I thought lsa might be a good fit, but I am not sure.  I have limited time, so I am hoping someone can point me in a direction of what I am looking for.

I have been searching for "text classifiers", perhaps this problem is referred to as something else.

Brian
#
On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com> wrote:
This is outside my expertise, but if memory serves, you might benefit
from googling the Levenshtein (spelling?) distance which allows this
sort of fuzzy matching of strings.

MW
#
On Nov 17, 2012, at 3:20 PM, R. Michael Weylandt wrote:

            
The 'agrep' function implements the Levenshtein function/
#
Thank you Michael and David.  I am onto agrep and adist and they look very useful for what I am wanting to do.  My initial results are promising!

Brian
On Nov 17, 2012, at 6:20 PM, R. Michael Weylandt wrote: