Skip to content

string functions

4 messages · Laetitia Schmid, Liviu Andronic, Greg Hirson +1 more

#
Hi!
Does anybody know a string function that would calculate how many  
characters two strings share? I.e. ("Hello World","Hello Peter") would  
be 7.
Thanks.
Laetitia
#
On 1/9/10, Laetitia Schmid <laetitia.schmid at gmx.ch> wrote:
Perhaps package ?stringr? has something related?
Liviu
#
Laetitia,

One approach:

lettermatch <- function(stringA, stringB) {
     sum(unique(unlist(strsplit(stringA, ""))) %in% 
unique(unlist(strsplit(stringB, ""))))
}

lettermatch("Hello World","Hello Peter")
yields 6, as the l is only singly counted.

This treats uppercase and lowercase as different letters and counts how 
many of the unique letters in stringA show up in stringB.

In another approach, letters are set to lowercase first. This I think 
gives you what you want:

lettermatch2 <- function(stringA, stringB) {
     tb <- merge(as.data.frame(table(strsplit(tolower(stringA), ""))), 
as.data.frame(table(strsplit(tolower(stringB), ""))), by="Var1")
     sum(apply(tb[-1], 1, min))
}

lettermatch("Hello World","Hello Peter")
yields 7.

Greg
On 1/9/10 1:51 PM, Laetitia Schmid wrote:

  
    
#
Maybe I don't understand the question. I can think of four ways to
count, none of which give me 7:

a <- "Hello World"
b <- "Hello Peter"

#counting duplicates and the space:
sa <- strsplit(a, split="")[[1]]
sb <- strsplit(b, split="")[[1]]
length(which(sb %in% sa == TRUE))

#counting the space but not the duplicates:
sa <- unique(strsplit(a, split="")[[1]])
sb <- unique(strsplit(b, split="")[[1]])
length(which(sb %in% sa == TRUE))

#counting the duplicates but not the space:
sa <- strsplit(a, split="")[[1]]
sa <- sa[-which(sa == " ")]
sb <- strsplit(b, split="")[[1]]
sb <- sb[-which(sb ==" ")]
length(which(sb %in% sa == TRUE))

#not counting duplicates or the space:
sa <- unique(sa)
sb <- unique(sb)
length(which(sb %in% sa == TRUE))

What am I missing?
On Sat, Jan 9, 2010 at 4:51 PM, Laetitia Schmid <laetitia.schmid at gmx.ch> wrote: