Skip to content

Conversation

@murataslan1
Copy link

Summary

This PR addresses three enhancement issues:

1. Add debug logs for collection load path (#45864)

Add debug logs at key checkpoints in the collection load process:

  • LoadCollectionJob.Execute() entry with metadata
  • Replica spawning start/completion
  • Collection description from broker
  • Target update operations
  • Collection observer task registration

2. Add default Chinese stop words (#45576)

Add _chinese_ keyword support for stop words filter with a comprehensive Chinese stop words list including:

  • Pronouns (我, 你, 他, etc.)
  • Auxiliary words (的, 地, 得, 了, etc.)
  • Prepositions and conjunctions (和, 与, 但是, etc.)
  • Adverbs (不, 很, 非常, etc.)
  • Common verbs and measure words
  • Time and location words

3. Use fnmatch for simple LIKE patterns (#44415)

Optimize pattern matching performance by using fnmatch for simple patterns that only contain % and _ wildcards without escape sequences. For complex patterns, fall back to boost::regex.

fnmatch is typically faster than regex for simple wildcard matching.

Closes

Type of change

  • Enhancement

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: murataslan1
To complete the pull request process, please assign jiaoew1991 after the PR has been reviewed.
You can assign the PR to them by writing /assign @jiaoew1991 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Nov 28, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 28, 2025

@murataslan1 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added needs-dco DCO is missing in this pull request. kind/enhancement Issues or changes related to enhancement labels Nov 28, 2025
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm [master branch only]
  • /ci-rerun-e2e-default // for ci-v2/e2e-default [master branch only]

If you have any questions or requests, please contact @zhikunyao.

@mergify
Copy link
Contributor

mergify bot commented Nov 28, 2025

@murataslan1 cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Nov 28, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

murataslan1 and others added 5 commits November 29, 2025 17:07
…client

Add support for HNSW_SQ, HNSW_PQ, and HNSW_PRQ index types in the Go client
to match the Python SDK functionality.

- Add HNSWSQ, HNSWPQ, and HNSWPRQ index type constants
- Implement NewHNSWSQIndex with sq_type and optional refine support
- Implement NewHNSWPQIndex with pqM, nbits and optional refine support
- Implement NewHNSWPRQIndex with pqM, nbits and optional refine support
- Add NewHNSWQuantAnnParam for search with refine_k parameter

Fixes milvus-io#44635

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Signed-off-by: Murat Aslan <[email protected]>
Add debug logs at key checkpoints in the collection load process:
- LoadCollectionJob.Execute() entry with metadata
- Replica spawning start/completion
- Collection description from broker
- Target update operations
- Collection observer task registration

Closes milvus-io#45864

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Signed-off-by: Murat Aslan <[email protected]>
Add '_chinese_' keyword support for stop words filter with a comprehensive
Chinese stop words list including:
- Pronouns (我, 你, 他, etc.)
- Auxiliary words (的, 地, 得, 了, etc.)
- Prepositions and conjunctions (和, 与, 但是, etc.)
- Adverbs (不, 很, 非常, etc.)
- Common verbs and measure words
- Time and location words

Based on common Chinese NLP stop words lists (Baidu, HIT).

Closes milvus-io#45576

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Signed-off-by: Murat Aslan <[email protected]>
Optimize pattern matching performance by using fnmatch for simple patterns
that only contain % and _ wildcards without escape sequences. For complex
patterns (with escape sequences or fnmatch special chars), fall back to
boost::regex.

Changes:
- Add is_simple_pattern() to detect simple wildcard patterns
- Add translate_pattern_match_to_fnmatch() to convert SQL wildcards to fnmatch
- Update RegexMatcher to use fnmatch when applicable
- fnmatch is typically faster than regex for simple wildcard matching

Closes milvus-io#44415

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Signed-off-by: Murat Aslan <[email protected]>
@murataslan1 murataslan1 force-pushed the feat/debug-logs-and-enhancements branch from c3d9656 to f409e80 Compare November 29, 2025 14:08
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Nov 29, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@codecov
Copy link

codecov bot commented Nov 29, 2025

Codecov Report

❌ Patch coverage is 97.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.46%. Comparing base (b69cd23) to head (f409e80).
⚠️ Report is 9 commits behind head on master.

⚠️ Current head f409e80 differs from pull request most recent head dec7e3f

Please upload reports for the commit dec7e3f to get more accurate results.

Files with missing lines Patch % Lines
internal/querycoordv2/job/job_load.go 85.71% 3 Missing ⚠️

❌ Your project check has failed because the head coverage (73.46%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (b69cd23) and HEAD (f409e80). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (b69cd23) HEAD (f409e80)
3 2
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #45927       +/-   ##
===========================================
- Coverage   82.73%   73.46%    -9.28%     
===========================================
  Files         524     1360      +836     
  Lines       82326   212386   +130060     
===========================================
+ Hits        68111   156021    +87910     
- Misses      14215    48940    +34725     
- Partials        0     7425     +7425     
Components Coverage Δ
Client 78.50% <100.00%> (∅)
Core ∅ <ø> (∅)
Go 74.11% <74.50%> (∅)
Files with missing lines Coverage Δ
client/index/hnsw.go 100.00% <100.00%> (ø)
...rnal/querycoordv2/observers/collection_observer.go 83.77% <100.00%> (ø)
internal/querycoordv2/job/job_load.go 82.26% <85.71%> (ø)

... and 1881 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Fix RegexMatcher to properly handle LIKE patterns vs regex patterns
- Add FromLikePattern factory method that uses fnmatch for simple patterns
- Update all callers to use FromLikePattern for LIKE pattern matching
- Add PatternMatchQuery method to BitmapIndex for fnmatch optimization
- Add comprehensive tests for the new fnmatch functionality

Signed-off-by: Murat Aslan <[email protected]>
@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels Nov 29, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. and removed size/XL Denotes a PR that changes 500-999 lines. labels Nov 29, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added the needs-dco DCO is missing in this pull request. label Nov 29, 2025
@mergify mergify bot removed the dco-passed DCO check passed. label Nov 29, 2025
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Signed-off-by: Murat Aslan <[email protected]>
@murataslan1 murataslan1 force-pushed the feat/debug-logs-and-enhancements branch from 5205322 to 792b715 Compare November 29, 2025 20:50
@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Nov 29, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@murataslan1
Copy link
Author

/run-cpu-e2e

@murataslan1
Copy link
Author

rerun go-sdk

@murataslan1
Copy link
Author

/ci-rerun-code-check

@murataslan1
Copy link
Author

/ci-rerun-build

@mergify
Copy link
Contributor

mergify bot commented Nov 30, 2025

@murataslan1 cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@mergify
Copy link
Contributor

mergify bot commented Nov 30, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@murataslan1
Copy link
Author

/ci-rerun-code-check

@murataslan1
Copy link
Author

/ci-rerun-build

@murataslan1
Copy link
Author

/run-cpu-e2e

@murataslan1
Copy link
Author

rerun go-sdk

@mergify
Copy link
Contributor

mergify bot commented Nov 30, 2025

@murataslan1 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

2 participants