Similarity matching with probabilities
On 27 Jun 2008, at 14:30, francogrex wrote:
Hello, It's just a strange coincidence that someone posted just very recently a question about matching. I know there are several match function in the base package (such as match, pmatch, charmatch, and the gsub etc) but I can't seem to use them wisely to be able to get what I need. suppose I have the following strings: "tets" "estt" "rtes7" "gstes" "tes5t" Is there an R procedure to determine how related each string is to the reference string "test", for example to say that "tets" is similar to "test" with a probability of 0.9 or something of that sort?
Have a look at ?agrep. One could loop for different max.distances to get the relation. An other way is to calculate the edit distance by Levenshtein(- Damerau). A starting point could be : http://wiki.r-project.org/rwiki/doku.php?id=tips:data-strings:levenshtein --Hans