Replace current regex engine with PCRE2#4033
Conversation
XVilka
left a comment
There was a problem hiding this comment.
- Please send it from
dist-branch fromrizinorgaccount, to check that it builds and runs fine on all supported platforms and configurations - Once it's done, I think it is worth removing the old implementation as well.
|
would be nice to also allow PCRE (v1) for older distro |
Currently for every distro which has no pcre2 it is compiled as subproject. Shouldn't this be enough? |
Yes, I think it should be enough. We have the same strategy for Capstone dependency as well. |
|
The PCRE2 defines a huge amount of options and flags. Is it fine if we do not add an |
|
you should only use |
558652e to
4168046
Compare
|
Tests/build still fail, but I'd like to know your opinion on the API first. Because now there are some changes in actual code now. |
XVilka
left a comment
There was a problem hiding this comment.
The new API looks fine. Have you tried comparing the performance with and without these changes?
| @@ -0,0 +1,5 @@ | |||
| [wrap-git] | |||
| url = https://github.com/PCRE2Project/pcre2.git | |||
| revision = 52c08847921a324c804cabf2814549f50bce1265 | |||
There was a problem hiding this comment.
Shouldn't we use release by default?
There was a problem hiding this comment.
Looks like they are preparing to make a new release soon, which would be great if done before our release: https://github.com/PCRE2Project/pcre2/commits/master/
| pcre2_code_free(regex); | ||
| } | ||
|
|
||
| RZ_OWN RzRegexMatchData *rz_regex_match_data_new(const RzRegex *regex, RzRegexGeneralContext *context) { |
There was a problem hiding this comment.
Doxygen for these functions is more important than oxygen for _free() function ;-)
XVilka
left a comment
There was a problem hiding this comment.
Please also fix test\db\archos\windows-x64\dbg_dts - remove \n or just don't include it in the capture.
macOS (Darwin) regexes should be fixed too, in particular with whitespace and newlines handling: https://ci.rizin.re/repos/27/pipeline/4083/7#L616
OpenBSD error message is puzzling:
ERROR: Regex compilation failed at 0: no more memory
Same happens on NetBSD too.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
Yep, aware of it. Just wait and bundle it with other fixes, so the CI is not triggered again and again. |
|
BSD problems seem to be a bug in PCRE2 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276252). Hopefully only affecting the JIT compiler? Going make a minimal example and open an issue later. If it is indeed a BSD only problem, I would just exclude BSD from JIT/ |
|
@Rot127 yes, though FreeBSD works just fine. The problem occurs only in OpenBSD and NetBSD, please disable JIT only for those, but not the FreeBSD. |
|
Once this PR is green after the rebase, please send one from the |
Absolutely. The commit history is a mess |
5ff6680 to
5a5d5af
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
Should be fixed now. |
PCRE2 has way better performance than the OpenBSD library (something around 20 times faster). The following flags are enabled for every pattern: - PCRE2_UTF - PCRE2_MATCH_INVALID_UTF - PCRE2_NO_UTF_CHECK All the others are optional. Changes made: - Adds PCRE2 as subproject. - Changes the API away from POSIX to PCRE2. - Edits many regex patterns because: - ' ' is skipped in patterns, if the EXTENDED flag is set for matching. '\s' must be set now. - '.' doesn't match newlines by default. - Changes the API so matches and their groups are bundled into PVectors. - Moves the regex component to rz_util.
|
@Rot127 please send a new PR from inside the |
Your checklist for this pull request
Detailed description
Replace the current regex engine with PCRE2.
Test plan
All green.
Closing issues
closes #3730
Partially addresses #4055
Todo
rz_vector_pop_ptr()function which doesn't usememcpy()and use it inmatch_all_flat(). Or add arz_vector_concat()function with the same functionality.__asmrz_regex_get_match_name