Maximum number of patterns and speed in grep
Hi, Given that you can't provide a full example, please at least provide str() on your data, more complete information on the problem, and ideally a small toy example that demonstrates precisely what you are doing. For instance, you tell us that you "get an error message" but you never tell us what it is. Don't you think we might need to know what the error is to be able to diagnose and fix it? Also, note that your "working" example simply overwrites array$chunk1[j] four times. Sarah
On Fri, Jul 6, 2012 at 10:45 AM, mdvaan <mathijsdevaan at gmail.com> wrote:
Hi,
I am using R's grep function to find patterns in vectors of strings. The
number of patterns I would like to match is 7,700 (of different sizes). I
noticed that I get an error message when I do the following:
data <- array()
for (j in 1:length(x))
{
array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"), x[j],
value = T))
}
When I break this up into 4 chunks of patterns it works:
data <- array()
for (j in 1:length(x))
{
array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = "|"),
x[j], value = T))
}
My questions: what's the maximum size of the patterns argument in grep? Is
there a way to do this faster? It is very slow.
Thanks.
Math
Sorry for not providing a reproducible example. It's a size issue which
makes it difficult to provide an example.
Sarah Goslee http://www.functionaldiversity.org