You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For most other regex engines, [X-Y-Z] is a valid character set, consisting of character range X to Y, literal dash -, and character Z. For example, you can verify at https://regex101.com/ that this is valid for all regex flavors it supports.
Currently Boost rejects this under the default syntax. It's not clear from the documentation whether this is valid for Boost. This can be easily fixed by e.g. changing to [X-YZ-] but I'm still interested in knowing if rejecting this is intentional.
terminate called after throwing an instance of 'boost::wrapexcept<boost::regex_error>'
what(): Invalid range end in character class The error occurred while parsing the regular expression: '[0-9->>>HERE>>>#]+'.
Program terminated with signal: SIGSEGV
Proposed fixes
Determine whether we intend to accept or reject such syntax.
If yes, change parsing code accordingly.
In either case, clarify in the documentation
The text was updated successfully, but these errors were encountered:
Does not answer whether should differ from other flavours or not. And I can no longer find documentation of Boost behaviour in this. But I can say what current, and I believe designed behaviour is:
○ Literal "-" must be escaped "\-" unless the 1st or last character in the class e.g. "[-DEF]" or "[DEF-]" as you state yourself.
I would advise that good coding standard is to almost never use an ambiguous sequence, even when allowed by the interpreter or compiler, if there is a more explicit alternative. For example we always use "[\-DEF]" style, so that if there is an edit that extends the class, e.g. "[123\-DEF]", it can not unintentionally create a class range. Similar with your example "[X-Y-Z]" could get extended to include "A" i.e. "[X-YA-Z]" creating a whole alphabet class by accident. Explicitness also makes it easier for another coder to read and understand your code.
{above explanation should now more sense now that I have escaped the previously invisible slashes "\" so that they are now displayed LOL}
I agree that no one should write regexes like that and I'm totally fine with Boost rejecting such regexes. The question is more about whether this is a deliberate design and whether this needs to be documented.
Summary
For most other regex engines,
[X-Y-Z]
is a valid character set, consisting of character range X to Y, literal dash-
, and character Z. For example, you can verify at https://regex101.com/ that this is valid for all regex flavors it supports.Currently Boost rejects this under the default syntax. It's not clear from the documentation whether this is valid for Boost. This can be easily fixed by e.g. changing to
[X-YZ-]
but I'm still interested in knowing if rejecting this is intentional.I encountered this when migrating from
std::regex
. This is explicitly valid forstd::regex
:https://en.cppreference.com/w/cpp/regex/ecmascript:
Minimal reproducible example
Code
Expected behavior
Returns 1
Actual behavior
Proposed fixes
The text was updated successfully, but these errors were encountered: