-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Adding support for HYBRID search. #3813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for hybrid search functionality to the Redis search client, enabling combined text and vector similarity searches. The implementation introduces new query types, result parsers, and comprehensive test coverage.
Key changes:
- New hybrid query classes for combining text search and vector similarity operations
- Hybrid search command execution and result parsing
- Post-processing configuration for filtering, sorting, and aggregating results
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_search.py | Comprehensive test suite covering hybrid search functionality including various query types, filters, and post-processing options |
| redis/connection.py | Minor whitespace cleanup (removed blank line) |
| redis/commands/search/hybrid_query.py | New file implementing hybrid query classes and post-processing configuration |
| redis/commands/search/commands.py | Added hybrid_search method and result parsing logic |
| redis/commands/search/init.py | Registered hybrid search command parser |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
1e8345a to
3b3b4f5
Compare
3b3b4f5 to
551315b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0ea7eec to
fc7c324
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ce28c55 to
ae531b0
Compare
| def scorer(self, scorer: str) -> "HybridSearchQuery": | ||
| """ | ||
| Scoring algorithm for text search query. | ||
| Allowed values are "TFIDF" or "BM25" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowed values are type?: 'BM25' | 'TFIDF' | 'DISMAX' | 'DOCSCORE' according to the design. Also maybe it makes sense to keep value as Enum so we don't need to add validation around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the other options - they also work + found some other that are not listed in the spec and are supported. I would prefer to leave it as string and even not be too detailed in the docstring - I am not sure what are all of the actually supported values...
| """Return the query string of this query object.""" | ||
| return self._query_string | ||
|
|
||
| def scorer(self, scorer: str) -> "HybridSearchQuery": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart of scorer algorithm, scorer also supports named parameters
SCORER algorithm params...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this version - currently only the scorer name can be provided.
| Add search method parameters to the query. | ||
| Args: | ||
| method: Vector search method name. Supported values are "KNN" or "RANGE". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum value could be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| Args: | ||
| method: Vector search method name. Supported values are "KNN" or "RANGE". | ||
| kwargs: Search method parameters. Use the param names for keys and the | ||
| values for the values. Example: {"K": 10, "EF_RUNTIME": 100}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K is a required argument, whereas EF_RUNTIME is optional. Makes sense to add some validation around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to add validation of the inputs, and this way allow more flexibility for future server api changes, but I have added more details in the docstrings
| conditions: Filter conditions. | ||
| """ | ||
| args = [conditions] | ||
| Filter.__init__(self, "FILTER", *args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filters are way more complex then this.
[FILTER] "<filter-expression>"
[POLICY [ADHOC/BATCHES/ACORN]]
[BATCHES BATCH_SIZE <batch-size-value>]
|
|
||
| def combine( | ||
| self, | ||
| method: Literal["RRF", "LINEAR"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design also specifies third one: FUNCTION. PRD also mentioned it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not supported in this version.
| Args: | ||
| method: The combine method to use - RRF or LINEAR. | ||
| kwargs: Additional combine parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here, isn't really clear which parameters could be used with which method
|
|
||
| ret = ["GROUPBY", str(len(fields)), *fields] | ||
| for reducer in reducers: | ||
| ret += ["REDUCE", reducer.NAME, str(len(reducer.args))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it needs to add REDUCE for each reducer? From PRD it seems that it accepts count argument and single REDUCE keyword. Also, it doesn't says about aliases. Correct me if I'm wrong
[GROUPBY count field... REDUCE function count...]
| the alias for the projection, and the value is the projection | ||
| expression itself, for example `apply(square_root="sqrt(@foo)")`. | ||
| """ | ||
| for alias, expr in kwexpr.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRD also doesn't specifies an option to have multiple APPLY keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it works... And there is no other way to apply more than one field transformation except to have the apply twice or more. There is a test covering that case.
|
|
||
| return self | ||
|
|
||
| def sort_by(self, *sortby: "SortbyField") -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like SortByField is missing WITHCOUNT option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find such an option in the spec.
ae531b0 to
a0985b4
Compare
Pull Request check-list
Please make sure to review and check all of these items:
NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
Please provide a description of the change here.