Skip to content
Prev 393737 / 398503 Next

Minimal match to regexp?

Thanks for pointing out my mistake.  I oversimplified the real problem.

I'll try to post a version of it that comes closer:  Suppose I have a 
string like this:

x <- "\n```html\nblah blah \n```\n\n```r\nblah blah\n```\n"

If I cat() it, I see that it is really markdown source:

   ```html
   blah blah
   ```

   ```r
   blah blah
   ```

I want to find the part that includes the html block, but not the r 
block.  So I want to match "```html", followed by a minimal number of 
characters, then "```".  Then this pattern works:

   pattern <- "\n```html\n.*?\n```\n"

and we get the right answer:

   cat(regmatches(x, regexpr(pattern, x)))

   ```html
   blah blah
   ```

Okay, but this flavour of markdown says there can be more backticks, not 
just 3.  So the block might look like

   ````html
   blah blah
   ````

I need to have the same number of backticks in the opening and closing 
marker.  So I make the pattern more complicated, and it doesn't work:

   pattern2 <- "\n([`]{3,})html\n.*?\n\\1\n"

This matches all of x:

   > pattern2 <- "\n([`]{3,})html\n.*?\n\\1\n"
   > cat(regmatches(x, regexpr(pattern2, x)))

   ```html
   blah blah
   ```

   ```r
   blah blah
   ```


Is that a bug, or am I making a silly mistake again?

Duncan Murdoch
On 25/01/2023 7:34 p.m., Andrew Simmons wrote: