Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Add HNSW encoding index & insertion/deletion algorithm #2368

Merged
merged 18 commits into from
Jul 13, 2024

Conversation

Beihao-Zhou
Copy link
Member

@Beihao-Zhou Beihao-Zhou commented Jun 18, 2024

Implement Proposal at #2316

Encoding

HNSW vector field metadata encoding:

ns | FIELD_META | index name | field name -> field flag | vector_type | dimension | distance_metric | initial_cap | m | ef_construction | ef_runtime | epsilon | num_levels

HNSW node index encoding:

ns | FIELD | index name | field name | level | NODE | key -> num_neighbours | vector dimension | [vector...]

HNSW edge index encoding:

ns | FIELD | index name | field name | level | EDGE |  key1 | key2 -> (nil)

Reference for other index encoding: #2329

Future steps

  • Add the plan operator and the corresponding executor
  • Add expression node (i.e. SQL/RediSearch parsers) for vector search
  • Modify some passes (eg. index_selection) to convert the expression node to plan operator
  • Improve HnswIndex construction (Avoid HnswIndex heavy construction because of mt19937 #2398)

@Beihao-Zhou Beihao-Zhou changed the title [Draft] Add HNSW encoding index & search/insertion algorithm feat: [Draft] Add HNSW encoding index & search/insertion algorithm Jun 25, 2024
@Beihao-Zhou Beihao-Zhou changed the title feat: [Draft] Add HNSW encoding index & search/insertion algorithm feat(search) Add HNSW encoding index & search/insertion algorithm Jul 1, 2024
@Beihao-Zhou Beihao-Zhou changed the title feat(search) Add HNSW encoding index & search/insertion algorithm feat(search): Add HNSW encoding index & search/insertion algorithm Jul 1, 2024
@Beihao-Zhou Beihao-Zhou marked this pull request as ready for review July 7, 2024 20:35
@Beihao-Zhou
Copy link
Member Author

Hi @PragmaTwice , this PR is ready for review! :)

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me. Thank you!

Also there are still some clang-tidy issues that need to be fixed.

src/search/hnsw_indexer.cc Outdated Show resolved Hide resolved
@Beihao-Zhou Beihao-Zhou changed the title feat(search): Add HNSW encoding index & search/insertion algorithm feat(search): Add HNSW encoding index & insertion/deletion algorithm Jul 9, 2024
Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine to me.

But there are some issues in CI:

  • one unit test case failed in macOS arm64,
  • some memory issues (likely use-after-free) reported by ASan/TSan.

Could you try to investigate them?

Copy link
Contributor

@Yangsx-1 Yangsx-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any heuristic logic to avoid isolated cluster?

src/search/search_encoding.h Outdated Show resolved Hide resolved
src/search/hnsw_indexer.h Outdated Show resolved Hide resolved
src/search/search_encoding.h Show resolved Hide resolved
src/search/search_encoding.h Outdated Show resolved Hide resolved
@Beihao-Zhou
Copy link
Member Author

I wonder if there is any heuristic logic to avoid isolated cluster?

@Yangsx-1 Good question but not yet, the plan for the PR is just to implement the hnsw construction first so didn't think ahead that much. Also I saw KQIR cannot be enabled on cluster mode yet[1], so also didn't take this into the scope of this PR.

[1] KQIR: a query engine for Apache Kvrocks that supports both SQL and RediSearch queries

@Beihao-Zhou
Copy link
Member Author

Beihao-Zhou commented Jul 12, 2024

The code looks fine to me.

But there are some issues in CI:

  • one unit test case failed in macOS arm64,
  • some memory issues (likely use-after-free) reported by ASan/TSan.

Could you try to investigate them?

@git-hulk @PragmaTwice
The issue was caused by the ComputeSimilarity calculates the distance between VectorItem based on the HnswVectorMetadata. In the unit test, I initialized VectorItem where its vector size less than metadata->dim, so looping through the vector causes memory leak.

I changed the code with one VectorItem::Create to do this validation early. Let me know if this still looks good to you <3

CR: 224141f
Successful workflow: https://github.com/Beihao-Zhou/kvrocks/actions/runs/9914854795

Copy link

sonarcloud bot commented Jul 13, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@PragmaTwice PragmaTwice merged commit 12269d7 into apache:unstable Jul 13, 2024
27 of 28 checks passed
@PragmaTwice
Copy link
Member

PragmaTwice commented Jul 13, 2024

Awesome. Thank you for your contribution!

@PragmaTwice
Copy link
Member

Hi @Beihao-Zhou , could you also open a tracking issue to track all issues and PRs for vector search in Kvrocks?

@Beihao-Zhou
Copy link
Member Author

Hi @Beihao-Zhou , could you also open a tracking issue to track all issues and PRs for vector search in Kvrocks?

Sure, will do that later today <3

@Beihao-Zhou Beihao-Zhou deleted the hnsw-indexer branch July 15, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants