On Fri, Jul 13, 2012 at 1:41 PM, mdvaan <mathijsdevaan@> wrote:
Here's some data (which should give you the error messages):
# read in data
data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header
=
T, sep = ",")
# first paste all data
data1 <- paste(data[,1], collapse = "|")
# second paste subsets of the data
data2a <- paste(data[1:750,1], collapse = "|")
data2b <- paste(data[751:1500,1], collapse = "|")
# define the object to be searched
text <- c("the first is Santa Fe Gold Corp", "the second is
Starpharma
Holdings")
# match
strapplyc(text, data1)
strapplyc(text, data2a)
strapplyc(text, data2b)
Thanks in advance!
Although it seems that strapplyc can handle larger regular expressions
than grep in R it seems neither can handle as many as in your example
so process it in chunks:
k <- 3000 # chunk size
f <- function(from, text) {
to <- min(from + k - 1, nrow(data))
r <- paste(data[seq(from, to), 1], collapse = "|")
r <- gsub("[().*?+{}]", "", r)
strapply(text, r)
}
ix <- seq(1, nrow(data), k)
out <- lapply(text, function(text) unlist(lapply(ix, f, text)))
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com