Skip to content

R Package for Text Manipulation

3 messages · Omar André Gonzáles Díaz, Gabor Grothendieck, David Winsemius

#
On Sat, Aug 9, 2014 at 8:15 AM, Omar Andr? Gonz?les D?az
<oma.gonzales at gmail.com> wrote:
1. The gsubfn function in the gsubfn package can do that.  These
commands extract the words and then apply the function represented in
formula notation in the second argument to them:

library(gsubfn) # home page at http://gsubfn.googlecode.com
s <- "The quick brown fox" # test data

# replace the word quick with QUICK

gsubfn("\\S+", ~ if (x == "quick") "QUICK" else x, s)
## [1] "The QUICK brown fox"

# replace words containing o with ?

gsubfn("\\S+", ~ if (grepl("o", x)) "?" else x, s)
## [1] "The quick ? ?"

2. It can also be done without packages:

# replace quick with QUICK

gsub("\\bquick\\b", "QUICK", s)
## [1] "The QUICK brown fox"

# or the following which first split s into a vector of words and
# operate on that pasting it back into a single string at the end

words <- strsplit(s, "\\s+")[[1]]
paste(replace(words, words == "quick", "QUICK"), collapse = " ")
## [1] "The QUICK brown fox"

# replace words containing o with ?.  Use `words` from above.

paste(replace(words, grepl("o", words), "?"), collapse = " ")
## [1] "The quick ? ?"
#
On Aug 9, 2014, at 5:15 AM, Omar Andr? Gonz?les D?az wrote:

            
That request is on the vague side. You are advised in the Posting Guide to include code that begins an analysis and then requests assistance with specific difficulties. (You are also asked to do this in a plain text message since HTML tends to scramble messages.) The base package offers the `grep`, `sub`, and `gsub` functions which bring the power of regular expression to the R user. There are much more flexible that anything that Excel offers. Please look at:

?grep
?regex
And do :