enable support for HNSW in dataprep and retriever for better performance #1779

leslieluyu · 2025-06-09T10:36:53Z

Description

For improve performance with large dataset, enable support HNSW in dataprep and retriever

In dataprep
1. add parameter of vector_schema of from_texts_return_keys
2. add VECTOR_SCHEMA environment variable to enable easily switch the algorithm
In retriver
1. add ENABLE_SCHEMA in config.py
2. add logic of using index_schema=INDEX_SCHEMA
3. add redis_schema_hnsw.yml for enable HNSW

Issues

List the issue or RFC link this PR is working on. If there is no such link, please mark it as n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

see the performance comparison between this PR(HNSW) and v1.3 oob(FLAT) when ingested pubmed_100files.txt(≈3.7M chunks)

Signed-off-by: leslieluyu <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: leslieluyu <[email protected]>

for more information, see https://pre-commit.ci

leslieluyu added 2 commits June 9, 2025 10:01

enable support vector-schema in dataprep redis

bb25053

Signed-off-by: leslieluyu <[email protected]>

enable support HNSW in retriever

6ef3566

Signed-off-by: leslieluyu <[email protected]>

leslieluyu requested review from XinyuYe-Intel, letonghan, lkk12014402 and lvliang-intel as code owners June 9, 2025 10:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

04d78e4

for more information, see https://pre-commit.ci

lvliang-intel approved these changes Jun 9, 2025

View reviewed changes

joshuayao added this to OPEA Jun 10, 2025

joshuayao added this to the v1.4 milestone Jun 10, 2025

joshuayao added the feature New feature or request label Jun 10, 2025

leslieluyu closed this Jun 12, 2025

leslieluyu force-pushed the main branch from 04d78e4 to 3240c96 Compare June 12, 2025 09:17

github-project-automation bot moved this to Done in OPEA Jun 12, 2025

resolve the confict

f74bef0

Signed-off-by: leslieluyu <[email protected]>

leslieluyu reopened this Jun 12, 2025

pre-commit-ci bot and others added 2 commits June 12, 2025 09:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

d44e6cf

for more information, see https://pre-commit.ci

Merge branch 'main' into main

4a1b2d6

xiguiw approved these changes Jun 13, 2025

View reviewed changes

xiguiw merged commit 1866ad7 into opea-project:main Jun 13, 2025
26 checks passed

leslieluyu mentioned this pull request Jun 17, 2025

[Feature]Helm should support HNSW algorithm in dataprep & retreiver of redis-vector-db opea-project/GenAIInfra#1128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable support for HNSW in dataprep and retriever for better performance #1779

enable support for HNSW in dataprep and retriever for better performance #1779

Uh oh!

leslieluyu commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

enable support for HNSW in dataprep and retriever for better performance #1779

enable support for HNSW in dataprep and retriever for better performance #1779

Uh oh!

Conversation

leslieluyu commented Jun 9, 2025

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants