Skip to content
Prev 61954 / 63424 Next

FR: valid_regex() to test string validity as a regular expression

On 10/10/23 01:57, Michael Chirico via R-devel wrote:
Hi Michael,

I don't think you need compilation functions for that. If a regular 
expression is found invalid by a specific third party library R uses, 
the library should return and error to R and R should return an error to 
you, and you should probably propagate that to your users. Grepping an 
empty string might work in many cases as a test, but it is probably more 
portable to simply be prepared to propagate such errors from the actual 
use on real inputs. In theory, there could be some optimization for a 
particular case, the checking may not be the same - but that is the same 
say for compilation and checking.
Re encodings, simply R strings should be valid in their encoding. This 
is not just for regular expressions but also for anything else. You 
shouldn't assume that R can handle invalid strings in any reasonable 
way. Definitely you shouldn't try adding invalid strings in tests - 
behavior with invalid strings is unspecified. To test whether a string 
is valid, there is validEnc() (or validUTF8()). But, again, it is 
probably safest to propagate errors from the regular expression R 
functions (in case the checks differ, particularly for non-UTF-8), also, 
duplicating the encoding checks can be a non-trivial overhead.

If there was a strong need to have an automated way to somehow classify 
specifically errors from the regex libraries, perhaps R could attach 
some classes to them when the library tells.

Tomas