parsing - input buffer overflow
On Fri, 13 Jun 2008, Daniel Malter wrote:
Hi, I am trying to parse a large amount of text using gregexpr(). Unfortunately, I get an "input buffer overflow" message when I attempt that with too large an amount of text. The error messages occurs before the parsing. The problem is that I cannot assign the text to a variable (an object) if the text is too large.
R does have limits on the command line length (1024 bytes up to R-devel, 4096 bytes there). What happens if you exceed that depends on the interface you are using (and you have not told us). Beyond that, the parser has a limit of MAXELTSIZE (8192 bytes) on strings. I don't see any need for 'improvement' though: why are you entering very long strings as part of the R program? They are data, and e.g. readLines() and scan() have no limits on string length beyond those imposed by R's internals (2^31-1 bytes).
This problem has been mentioned before, which I found using the RSiteSearch. However, the post is from 2006, and I thought it might have improved by now. Is there any way to increase the limit or to get around this problem? x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island, Tristan da Cunha"
I presume that is not an example? It looks like a character vector which has been collapsed by paste(x, ", ") and would be better strsplit() into its components than using gregexpr.
#What I want to achieve is to parse the text for the number of occurrences
of a certain character string within the text.
#This is done using:
n=100 #choose n large enough
length(which(is.na(gregexpr("Saint",x,ignore.case=TRUE)[[1]][1:n])==FALSE))
But again, if the text is large, I cannot assign it to x. I'd be grateful
for any suggestions.
Cheers,
Daniel
-------------------------
cuncta stricte discussurus
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595