Which function to use: grep, replace, substr etc.?
On Oct 16, 2011, at 1:32 PM, Jeff Newmiller wrote:
Note that "male" comes before "female" in your data frame. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live...
syrvn <mentor_ at gmx.net> wrote:
Hi,
thanks for the tip! I do it as follows now but I still have a
problem I do
not understand:
abbrvs <- data.frame(c("peter", "name", "male", "female"),
c("P", "N", "m", "f"))
colnames(abbrvs) <- c("pattern", "replacement")
str <- "My name is peter and I am male"
for(m in 1:nrow(abbrvs)) {
str <- sub(abbrvs$pattern[m], abbrvs$replacement[m], str,
fixed=TRUE)
print(str)
}
This works perfectly fine as I get: "My N is P and I am m"
However, when I replace male by female then I get the following: "My
N is P
and I am fem"
but I want to have "My N is P and I am f".
Even with the parameter fixed=true I get the same result. Why is that?
Because "male" is in "female? This reminds me of a comment on a posting I made this morning on SO. http://stackoverflow.com/questions/7782113/counting-keyword-occurrences-in-r The problem was slightly different, but the greppish principle was that in order to match only complete words, you need to specific "^", "$" or " " at each end of the word: dataset <- c("corn", "cornmeal", "corn on the cob", "meal") grep("^corn$|^corn | corn$", dataset) [1] 1 3 In such cases you may want to look at the gsubfn package. It offers higher level matching functions and I think strapply might be more efficient and expressive here. I can imagine construction in a loop such as yours, but you would probably want to build a pattern outside the sub() call. After struggling to fix your loop (and your data.frame which definitely should not be using factor variables), I am even more convinced you should be learning "gubfn" facilities. (Tate out the debugging print statements.) > abbrvs <- data.frame(c("peter", "name", "male", "female"), + c(" P ", " N ", " m ", " f "), stringsAsFactors=FALSE) > > colnames(abbrvs) <- c("pattern", "replacement") > for(m in 1:nrow(abbrvs)) { patt <- paste("^",abbrvs$pattern[m], "$| ", + abbrvs$pattern[m], " | ", + abbrvs$pattern[m], "$", sep="") + print(c( patt, abbrvs$replacement[m])) + str <- sub(patt, abbrvs$replacement[m], str) + print(str) + } [1] "^peter$| peter | peter$" " P " [1] "My name is P and I am female" [1] "^name$| name | name$" " N " [1] "My N is P and I am female" [1] "^male$| male | male$" " m " [1] "My N is P and I am female" [1] "^female$| female | female$" " f " [1] "My N is P and I am f "
David Winsemius, MD Heritage Laboratories West Hartford, CT