Skip to content
Prev 300162 / 398502 Next

Maximum number of patterns and speed in grep

On Fri, Jul 13, 2012 at 1:41 PM, mdvaan <mathijsdevaan at gmail.com> wrote:
Although it seems that strapplyc can handle larger regular expressions
than grep in R it seems neither can handle as many as in your example
so process it in chunks:

k <- 3000 # chunk size

f <- function(from, text) {
	to <- min(from + k - 1, nrow(data))
	r <- paste(data[seq(from, to), 1], collapse = "|")
	r <- gsub("[().*?+{}]", "", r)
	strapply(text, r)
}
ix <- seq(1, nrow(data), k)
out <- lapply(text, function(text) unlist(lapply(ix, f, text)))