Message-ID: <6399AE47-C286-40AE-8419-61CD1A87CF10@comcast.net>
Date: 2012-11-17T23:38:02Z
From: David Winsemius
Subject: library/function to compare two phrases?
In-Reply-To: <CAAmySGOTOT9nTGtuDQtc+zEW-q6QF0aenX8occUGYxESY0gVWw@mail.gmail.com>
On Nov 17, 2012, at 3:20 PM, R. Michael Weylandt wrote:
> On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com> wrote:
>> I am looking for a library/function in R that can compare two phrases and give me a score, or somehow classify them as correct as possible.
>>
>> The "phrases" are obfuscated/messy. I am not concerned about which is "correct" (for example spell checking), I am only concerned in grouping them
>> so that I know they are the closest match.
>>
>> Example:
>>
>> I have ROW1 and ROW2 like so:
>>
>> ROW1 ROW2
>> hamburger helper bigmc heartkcatta
>> chicken nuggets chicke, nuggets, jss
>> bigmac heartattack some sombody somehwere
>> somebody somehwere repleh regrubmah
>>
>> I am looking for something that can tell me that the best match for hamburger helper is repleh regrubmah, and the same for each other row.
>>
>> So my goal is to write a program that foreach phrase in ROW1 runs this function against ROW2 and gives me the phrase that scored best.
>>
>> I have read over much of the NLP packages at http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>>
>> I thought lsa might be a good fit, but I am not sure. I have limited time, so I am hoping someone can point me in a direction of what I am looking for.
>>
>> I have been searching for "text classifiers", perhaps this problem is referred to as something else.
>>
>
> This is outside my expertise, but if memory serves, you might benefit
> from googling the Levenshtein (spelling?) distance which allows this
> sort of fuzzy matching of strings.
The 'agrep' function implements the Levenshtein function/
--
David Winsemius, MD
Alameda, CA, USA