Skip to content
Prev 164639 / 398506 Next

gregexpr - match overlap mishandled (PR#13391)

Greg Snow wrote:
another option would be to move the anchor backwards after each match,
but i'm not sure if the problem really needs it and if it could be done
from within r.

greg (and another person who answered this post earlier):
while your frustration is understandable, i think reid (and possibly
other users as well) would benefit from a brief explanation instead of
your emotional reactions.  you ought to be more patient and less
arrogant with newbies who will often think there is a bug in r when
there isn't.

reid:
when matching is performed, there is a pointer moved through the
string.  in global matching, after a match is found the pointer is just
behind the matched substring, and further matching proceeds from there. 
for example example, suppose you match "aaa" (the string) with "aa" (the
pattern) globally.  after the first successful match, the position
pointer is *behind the second a* in the string, and no further match can
be found from there.in this context, 'global' does not mean that all
possible matches are found, rather that matching is performed iteratively.

the above is probably a solution to your problem, though the matches
have length 4, not 8.  in perl, you could manually move back the anchor
after each match, e.g.:

$string = "1122" x 10;
$n = length($string)/2;
@matches = ();
$string =~ /11221122(??{push @matches [$-[0], $&]; pos($s) -= $n})/g;

now @matches has 9 elements, each a ref to an array with the starting
position and the content (of length 8) of the respective match:
@matches = ([0, "11221122"], [4, "11221122"], ...)

not sure if you can do this within r.  not sure if you'll ever need it. 
for more complex cases when you need overlapping matches and you need
their content, greg's solution might not do, but in general that's the
solution.

vQ