Skip to content
Back to formatted view

Raw Message

Message-ID: <1342452422978-4636657.post@n4.nabble.com>
Date: 2012-07-16T15:27:02Z
From: mdvaan
Subject: Maximum number of patterns and speed in grep
In-Reply-To: <CAP01uR=nrcKUFO_fUg6zPBQWCyjfwy_HjF-w8nqabmjbMnVJYA@mail.gmail.com>

Thanks! That worked like a charm.

Math


Gabor Grothendieck wrote
> 
> On Fri, Jul 13, 2012 at 1:41 PM, mdvaan &lt;mathijsdevaan@&gt; wrote:
>> Here's some data (which should give you the error messages):
>>
>>     # read in data
>>     data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header
>> =
>> T, sep = ",")
>>
>>     # first paste all data
>>     data1 <- paste(data[,1], collapse = "|")
>>
>>     # second paste subsets of the data
>>     data2a <- paste(data[1:750,1], collapse = "|")
>>     data2b <- paste(data[751:1500,1], collapse = "|")
>>
>>     # define the object to be searched
>>     text <- c("the first is Santa Fe Gold Corp", "the second is
>> Starpharma
>> Holdings")
>>
>>     # match
>>     strapplyc(text, data1)
>>     strapplyc(text, data2a)
>>     strapplyc(text, data2b)
>>
>> Thanks in advance!
>>
> 
> Although it seems that strapplyc can handle larger regular expressions
> than grep in R it seems neither can handle as many as in your example
> so process it in chunks:
> 
> k <- 3000 # chunk size
> 
> f <- function(from, text) {
> 	to <- min(from + k - 1, nrow(data))
> 	r <- paste(data[seq(from, to), 1], collapse = "|")
> 	r <- gsub("[().*?+{}]", "", r)
> 	strapply(text, r)
> }
> ix <- seq(1, nrow(data), k)
> out <- lapply(text, function(text) unlist(lapply(ix, f, text)))
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4636657.html
Sent from the R help mailing list archive at Nabble.com.