When I tried it on Windows XP there was a grinding sound, probably memory
being swapped and it just seemed to go on forever and I finally had to kill R.
I am using "R version 2.2.1, 2005-12-20". What did seem to work was this:
gregexpr("X", gsub("\\b\\w|\\w\\b", "X", text))
where "X" should be replaced with some character not in the text.
On 1/31/06, Stefan Th. Gries <stgries_lists at arcor.de> wrote:
Hi
I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings.
text<-c("This is a first example sentence.", "And this is a second example sentence.")
If I now look for word boundaries with regexpr, this is what I get:
regexpr("\\b", text, perl=TRUE)
[1] 1 1 attr(,"match.length") [1] 0 0 So far, so good. But with gregexpr I get:
gregexpr("\\b", text, perl=TRUE)
Error: cannot allocate vector of size 524288 Kb In addition: Warning messages: 1: Reached total allocation of 1015Mb: see help(memory.size) 2: Reached total allocation of 1015Mb: see help(memory.size) Why don't I get the locations and extensions of all word boundaries? I am using R 2.2.1 on a machine running Windows XP:
R.version
_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 2.1 year 2005 month 12 day 20 svn rev 36812 language R Thanks a lot, STG -- Stefan Th. Gries ---------------------------------------- University of California, Santa Barbara http://people.freenet.de/Stefan_Th_Gries
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html