R Package for Text Manipulation
On Sat, Aug 9, 2014 at 8:15 AM, Omar Andr? Gonz?les D?az
<oma.gonzales at gmail.com> wrote:
Hi all,
I want to know, where i can find a package to simulate the functions
"Search and Replace and "Find Words that contain - replace them with...",
that we can use in EXCEL.
I've look in other places and they say: "Reshape2" by Hadley Wickham. How
ever, i've investigated it and its not exactly what i'm looking (it's main
functions are "cast" and "melt", sure you know them).
May you help me please? I want to download data from Google Analytics and
clean it, what is the best approach?
[[alternative HTML version deleted]]
1. The gsubfn function in the gsubfn package can do that. These commands extract the words and then apply the function represented in formula notation in the second argument to them: library(gsubfn) # home page at http://gsubfn.googlecode.com s <- "The quick brown fox" # test data # replace the word quick with QUICK gsubfn("\\S+", ~ if (x == "quick") "QUICK" else x, s) ## [1] "The QUICK brown fox" # replace words containing o with ? gsubfn("\\S+", ~ if (grepl("o", x)) "?" else x, s) ## [1] "The quick ? ?" 2. It can also be done without packages: # replace quick with QUICK gsub("\\bquick\\b", "QUICK", s) ## [1] "The QUICK brown fox" # or the following which first split s into a vector of words and # operate on that pasting it back into a single string at the end words <- strsplit(s, "\\s+")[[1]] paste(replace(words, words == "quick", "QUICK"), collapse = " ") ## [1] "The QUICK brown fox" # replace words containing o with ?. Use `words` from above. paste(replace(words, grepl("o", words), "?"), collapse = " ") ## [1] "The quick ? ?"
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com