-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex stable_hash fix #78
base: main
Are you sure you want to change the base?
Regex stable_hash fix #78
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #78 +/- ##
===========================================
- Coverage 93.61% 70.27% -23.35%
===========================================
Files 7 7
Lines 673 666 -7
===========================================
- Hits 630 468 -162
- Misses 43 198 +155 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
🤔 I'm a little worried these tests could still pass even if the internal ptr data gets hashed (e.g. if it just so happens the implementation or regex has changed recently, such that in this situation the ptr has the same value).
Have you run the tests for HashVerion <= 3 to verify that they're broken? I think it would be good to include some @test_broken
tests for those versions.
I've added tests for versions <= 3. I've also added a test that the Ptr is indeed different. So subsequent tests can work under the assumption that the pointers are different. I couldn't think of any better way of doing it. Also added tests for Regex options for completeness. |
This looks good! It looks like there are some issues with a recent Pluto verison, unrelated to your PR. I want to make sure that gets fixed and verify that all is well with older versions of Julia. (For StableHashTraits.jl > 2.0 and greater I think it makes to support only >= 1.10, but I don't want to break old hash versions if I don't have to). I can look into that and make the updates in the next day or so. |
There are also some issues with my Regex tests for Julia <= 1.9. I'll look into those. |
The failures are actually due to |
Add unit tests for Regex (HashVersion{4} only).
More explicit tests to make it easier to understand how the Regex internals affect the hash.
This is done to get stable Regex hashes across Julia versions. Update unit test reference hashes.
f298f19
to
61b5b6e
Compare
I've updated this PR to include a transformer for Currently tests on 1.6 do not pass. julia> Regex("regex", Base.PCRE.UTF | Base.PCRE.MATCH_INVALID_UTF | Base.PCRE.ALT_BSUX | Base.PCRE.UCP, Base.DEFAULT_MATCH_OPTS)
ERROR: ArgumentError: invalid regex compile options: 67764226 I think it's OK that the hash is different on 1.6. The regex compile flags are different after all, so in theory we could get different matches when using the regex. Do you agree? If you do agree, I can disable the tests on Julia 1.6. I don't know how to mark them as broken, since it's using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK that the hash is different on 1.6. The regex compile flags are different after all, so in theory we could get different matches when using the regex. Do you agree?
I think making this work for 1.6 shouldn't be too bad. We just want something like this, right?
hash_compile_options = re.compile_options & (PRCRE.CASELESS | PCRE.MULTILINE | PCRE.DOTALL | PRCE.EXTENDED)
hash_match_options = re.match_options & ~PCRE.NO_UTF_CHECK
There are some risks with this approach: it can still be broken by future changes to the Regex implementation. But we already run that risk if e.g. a field name of the Regex
struct changes.
##### Regex | ||
##### | ||
|
||
transform_type(::Type{Regex}) = "Base.Regex" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this transform is necessary. The default transform_type
would hash "Regex"
. In general we are trying to avoid the use of module prefixes in the type hash, because the parent module is often considered an implementation detail and can change in non-breaking releases. Might as well stay consistent here.
Description
Regex
is a mutable struct (for technical reasons), with aPtr
to a compiled regex, which for hash versions < 4 causedRegex
objects to hash differently each time.It seems to be solved for hash version 4, and this PR adds unit tests to guard against regressions.
See also discussion in #66.
Benchmarks
Not applicable.