string functions

Hi!
Does anybody know a string function that would calculate how many  
characters two strings share? I.e. ("Hello World","Hello Peter") would  
be 7.
Thanks.
Laetitia
 Does anybody know a string function that would calculate how many characters two strings share? I.e. ("Hello World","Hello Peter") would be 7.

Perhaps package ?stringr? has something related?
Liviu
Laetitia,

One approach:

lettermatch <- function(stringA, stringB) {
     sum(unique(unlist(strsplit(stringA, ""))) %in% 
unique(unlist(strsplit(stringB, ""))))
}

lettermatch("Hello World","Hello Peter")
yields 6, as the l is only singly counted.

This treats uppercase and lowercase as different letters and counts how 
many of the unique letters in stringA show up in stringB.

In another approach, letters are set to lowercase first. This I think 
gives you what you want:

lettermatch2 <- function(stringA, stringB) {
     tb <- merge(as.data.frame(table(strsplit(tolower(stringA), ""))), 
as.data.frame(table(strsplit(tolower(stringB), ""))), by="Var1")
     sum(apply(tb[-1], 1, min))
}

lettermatch("Hello World","Hello Peter")
yields 7.

Greg
Hi!
Does anybody know a string function that would calculate how many 
characters two strings share? I.e. ("Hello World","Hello Peter") would 
be 7.
Thanks.
Laetitia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Greg Hirson
ghirson at ucdavis.edu

Graduate Student
Agricultural and Environmental Chemistry

1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616
Maybe I don't understand the question. I can think of four ways to
count, none of which give me 7:

a <- "Hello World"
b <- "Hello Peter"

#counting duplicates and the space:
sa <- strsplit(a, split="")[[1]]
sb <- strsplit(b, split="")[[1]]
length(which(sb %in% sa == TRUE))

#counting the space but not the duplicates:
sa <- unique(strsplit(a, split="")[[1]])
sb <- unique(strsplit(b, split="")[[1]])
length(which(sb %in% sa == TRUE))

#counting the duplicates but not the space:
sa <- strsplit(a, split="")[[1]]
sa <- sa[-which(sa == " ")]
sb <- strsplit(b, split="")[[1]]
sb <- sb[-which(sb ==" ")]
length(which(sb %in% sa == TRUE))

#not counting duplicates or the space:
sa <- unique(sa)
sb <- unique(sb)
length(which(sb %in% sa == TRUE))

What am I missing?
Hi!
Does anybody know a string function that would calculate how many characters
two strings share? I.e. ("Hello World","Hello Peter") would be 7.
Thanks.
Laetitia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org