gregexpr - match overlap mishandled (PR#13391)
Greg Snow wrote:
Controlling the pointer is going to be very different from perl since the R functions are vectorized rather than focusing on a single string. Here is one approach that will give all the matches and lengths (for the original problem at least):
mystr <- paste(rep("1122", 10), collapse="")
n <- nchar(mystr)
mystr2 <- substr(rep(mystr,n), 1:n, n)
tmp <- regexpr("^11221122", mystr2)
(tmp + 1:n - 1)[tmp>0]
[1] 1 5 9 13 17 21 25 29 33
attr(tmp,"match.length")[tmp>0]
[1] 8 8 8 8 8 8 8 8 8
while not exactly what i meant, this is an implementation of one of the approaches mentioned below, ith care taken not to report duplicate matches:
sequentially perform single matches on successive substrings of the input string (which can give you the same match more than once, though).
one issue with your solution is that it allocates n substrings at the same time, which requires O(n^2) space (with n the length of the original string), but it may be faster than a for loop matching one substring at a time. vQ