Skip to content
Prev 342991 / 398506 Next

R Package for Text Manipulation

On Sat, Aug 9, 2014 at 8:15 AM, Omar Andr? Gonz?les D?az
<oma.gonzales at gmail.com> wrote:
1. The gsubfn function in the gsubfn package can do that.  These
commands extract the words and then apply the function represented in
formula notation in the second argument to them:

library(gsubfn) # home page at http://gsubfn.googlecode.com
s <- "The quick brown fox" # test data

# replace the word quick with QUICK

gsubfn("\\S+", ~ if (x == "quick") "QUICK" else x, s)
## [1] "The QUICK brown fox"

# replace words containing o with ?

gsubfn("\\S+", ~ if (grepl("o", x)) "?" else x, s)
## [1] "The quick ? ?"

2. It can also be done without packages:

# replace quick with QUICK

gsub("\\bquick\\b", "QUICK", s)
## [1] "The QUICK brown fox"

# or the following which first split s into a vector of words and
# operate on that pasting it back into a single string at the end

words <- strsplit(s, "\\s+")[[1]]
paste(replace(words, words == "quick", "QUICK"), collapse = " ")
## [1] "The QUICK brown fox"

# replace words containing o with ?.  Use `words` from above.

paste(replace(words, grepl("o", words), "?"), collapse = " ")
## [1] "The quick ? ?"