strapply and characters adjacent to the matched pattern
On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan at gmail.com> wrote:
Hi,
In the example below, one of the searched patterns "SE" is matched in the
word "second". I would like to ignore all matches in which the character
following the match is one of [:alpha:]. How do I do this without removing
the "ignore.case = T" argument of the strapply function? Thank you very
much!
# load library
require(gsubfn)
# read in data
data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
# define the object to be searched
text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
Holdings")
# match
strapply(text, data, ignore.case = T)
The preferred outcome would be:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "Starpharma Holdings"
instead of:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "se" "Starpharma Holdings"
Try this:
strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")
[[1]] NULL [[2]] [1] "ab" [[3]] [1] "ab"
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com