-
Notifications
You must be signed in to change notification settings - Fork 3.7k
enhance: add debug logs, Chinese stop words, and fnmatch optimization #45927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
enhance: add debug logs, Chinese stop words, and fnmatch optimization #45927
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: murataslan1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@murataslan1 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
|
[ci-v2-notice]
To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
|
@murataslan1 cpu-e2e job failed, comment |
|
@murataslan1 go-sdk check failed, comment |
…client Add support for HNSW_SQ, HNSW_PQ, and HNSW_PRQ index types in the Go client to match the Python SDK functionality. - Add HNSWSQ, HNSWPQ, and HNSWPRQ index type constants - Implement NewHNSWSQIndex with sq_type and optional refine support - Implement NewHNSWPQIndex with pqM, nbits and optional refine support - Implement NewHNSWPRQIndex with pqM, nbits and optional refine support - Add NewHNSWQuantAnnParam for search with refine_k parameter Fixes milvus-io#44635 Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> Signed-off-by: Murat Aslan <[email protected]>
Add debug logs at key checkpoints in the collection load process: - LoadCollectionJob.Execute() entry with metadata - Replica spawning start/completion - Collection description from broker - Target update operations - Collection observer task registration Closes milvus-io#45864 Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> Signed-off-by: Murat Aslan <[email protected]>
Add '_chinese_' keyword support for stop words filter with a comprehensive Chinese stop words list including: - Pronouns (我, 你, 他, etc.) - Auxiliary words (的, 地, 得, 了, etc.) - Prepositions and conjunctions (和, 与, 但是, etc.) - Adverbs (不, 很, 非常, etc.) - Common verbs and measure words - Time and location words Based on common Chinese NLP stop words lists (Baidu, HIT). Closes milvus-io#45576 Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> Signed-off-by: Murat Aslan <[email protected]>
Optimize pattern matching performance by using fnmatch for simple patterns that only contain % and _ wildcards without escape sequences. For complex patterns (with escape sequences or fnmatch special chars), fall back to boost::regex. Changes: - Add is_simple_pattern() to detect simple wildcard patterns - Add translate_pattern_match_to_fnmatch() to convert SQL wildcards to fnmatch - Update RegexMatcher to use fnmatch when applicable - fnmatch is typically faster than regex for simple wildcard matching Closes milvus-io#44415 Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> Signed-off-by: Murat Aslan <[email protected]>
Signed-off-by: Murat Aslan <[email protected]>
c3d9656 to
f409e80
Compare
|
@murataslan1 cpu-e2e job failed, comment |
|
@murataslan1 go-sdk check failed, comment |
Codecov Report❌ Patch coverage is Please upload reports for the commit dec7e3f to get more accurate results.
❌ Your project check has failed because the head coverage (73.46%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## master #45927 +/- ##
===========================================
- Coverage 82.73% 73.46% -9.28%
===========================================
Files 524 1360 +836
Lines 82326 212386 +130060
===========================================
+ Hits 68111 156021 +87910
- Misses 14215 48940 +34725
- Partials 0 7425 +7425
🚀 New features to boost your workflow:
|
- Fix RegexMatcher to properly handle LIKE patterns vs regex patterns - Add FromLikePattern factory method that uses fnmatch for simple patterns - Update all callers to use FromLikePattern for LIKE pattern matching - Add PatternMatchQuery method to BitmapIndex for fnmatch optimization - Add comprehensive tests for the new fnmatch functionality Signed-off-by: Murat Aslan <[email protected]>
|
@murataslan1 cpu-e2e job failed, comment |
|
@murataslan1 go-sdk check failed, comment |
|
@murataslan1 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> Signed-off-by: Murat Aslan <[email protected]>
5205322 to
792b715
Compare
|
@murataslan1 cpu-e2e job failed, comment |
|
@murataslan1 go-sdk check failed, comment |
Signed-off-by: Murat Aslan <[email protected]>
|
/run-cpu-e2e |
|
rerun go-sdk |
|
/ci-rerun-code-check |
|
/ci-rerun-build |
|
@murataslan1 cpu-e2e job failed, comment |
|
@murataslan1 go-sdk check failed, comment |
|
/ci-rerun-code-check |
|
/ci-rerun-build |
|
/run-cpu-e2e |
|
rerun go-sdk |
|
@murataslan1 go-sdk check failed, comment |
Summary
This PR addresses three enhancement issues:
1. Add debug logs for collection load path (#45864)
Add debug logs at key checkpoints in the collection load process:
2. Add default Chinese stop words (#45576)
Add
_chinese_keyword support for stop words filter with a comprehensive Chinese stop words list including:3. Use fnmatch for simple LIKE patterns (#44415)
Optimize pattern matching performance by using fnmatch for simple patterns that only contain
%and_wildcards without escape sequences. For complex patterns, fall back to boost::regex.fnmatch is typically faster than regex for simple wildcard matching.
Closes
Type of change