-
Notifications
You must be signed in to change notification settings - Fork 62
Implement interface for unified hybrid search in Redis 8.4.0+ #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have forgotten to export HybridQuery from redisvl.query.hybrid in the redisvl.query __init__.py
from redisvl.query.hybrid import HybridQuery # this works
from redisvl.query import HybridQuery # this will fail
Basically because there are now two different HybridQuery classes
- redisvl.query.aggregate.HybridQuery (old wrapper around AggregateHybridQuery)
- redisvl.query.hybrid.HybridQuery (new one)
This might cause some confusion and we might have to either choose a different name or find a better solution and deprecate the older one. Might change the name of the older class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall!
Just need to verify how we need to handle the deprecated HybridQuery class
Don't forget to update the api/sphinx documentation and add a guide in ai-resources
…tween `HybridQuery` and `AggregateHybridQuery`
| except Exception as e: | ||
| raise RedisSearchError(f"Unexpected error while searching: {str(e)}") from e | ||
|
|
||
| def hybrid_search(self, query: Any, **kwargs) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working with this a bit more now I'm not sure that I love having a separate method for hybrid queries. I would kind of like to just pass a HybridQuery to the standard .query method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. we should not need to break the searchindex interface
justin-cechmanek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of great work here! Let's work together to see how we can move this closer to our current query patterns.
| except Exception as e: | ||
| raise RedisSearchError(f"Unexpected error while searching: {str(e)}") from e | ||
|
|
||
| def hybrid_search(self, query: Any, **kwargs) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the query() method that accepts all our different query objects types. Let's use that as the way to run all our queries. We can do a query type check in query() like we do now and change this to a private method to call from query()
| except Exception as e: | ||
| raise RedisSearchError(f"Unexpected error while searching: {str(e)}") from e | ||
|
|
||
| async def hybrid_search(self, query: Any, **kwargs) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same change here. Let's instead expand our async query() method to something like
async def query(self, query: Union[BaseQuery, AggregationQuery]) -> List[Dict[str, Any]]:
if isinstance(query, AggregationQuery):
return self._aggregate(query)
elif isinstance(HybridQuery):
return self._hybrid_search(query)
else:
return self._query(query)
| ) | ||
| except ImportError: | ||
| raise ImportError( | ||
| "Hybrid queries require Redis Open Source >= 8.4.0 and redis-py>=7.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Hybrid queries require Redis Open Source >= 8.4.0 and redis-py>=7.1.0" | |
| "Hybrid queries require Redis >= 8.4.0 and redis-py>=7.1.0" |
This works on enterprise Redis too :)
| text_filter_expression: Optional[Union[str, FilterExpression]] = None, | ||
| yield_text_score_as: Optional[str] = None, | ||
| vector_search_method: Optional[Literal["KNN", "RANGE"]] = None, | ||
| knn_k: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our other query objects, including RangeQuery and VectorRangeQuery we have the num_results parameter. We can add consistency by using the same parameter here.
| range_radius: Optional[int] = None, | ||
| range_epsilon: Optional[float] = None, | ||
| yield_vsim_score_as: Optional[str] = None, | ||
| vector_filter_expression: Optional[Union[str, FilterExpression]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having both a text_filter_expression and vector_filter_expression feels redundant. Can we change this to only use one filter_expression parameter like our other Query classes?
| linear_alpha: Optional[float] = None, | ||
| linear_beta: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having an alpha and beta parameter looks redundant, and from a user's point of view it's not clear how they would interact. Feels like a foot gun
| linear_alpha: Optional[float] = None, | ||
| linear_beta: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my previous comment, can we reduce this to one parameter only?
| text_filter_expression=filter_expression, | ||
| vector_filter_expression=filter_expression, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned earlier, but it's not clear to a user how or why to pass two filter expressions.
| def test_hybrid_query_with_both_text_and_vector_filters(): | ||
| """Test HybridQuery with both text_filter_expression and vector_filter_expression.""" | ||
| text_filter = Tag("category") == "movies" | ||
| vector_filter = Tag("genre") == "comedy" | ||
|
|
||
| hybrid_query = HybridQuery( | ||
| text=sample_text, | ||
| text_field_name="description", | ||
| vector=sample_vector, | ||
| vector_field_name="embedding", | ||
| text_filter_expression=text_filter, | ||
| vector_filter_expression=vector_filter, | ||
| ) | ||
|
|
||
| # Verify both filters are in the query | ||
| args = get_query_pieces(hybrid_query) | ||
| assert args == [ | ||
| "SEARCH", | ||
| "(~@description:(toon | squad | play | basketball | gang | aliens) AND @category:{movies})", | ||
| "SCORER", | ||
| "BM25STD", | ||
| "VSIM", | ||
| "@embedding", | ||
| bytes_vector, | ||
| "FILTER", | ||
| "@genre:{comedy}", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments about having two filter_expressions
This PR adds an interface for using the new
FT.HYBRIDsearch operation introduced in Redis Open Source 8.4.0.Changes
redisvl.query.hybridThis new module defines query constructors for working with the new hybrid search operation. Attempting to import this module without having
redis-py>=7.1.0will raise anImportError. It defines aHybridQueryobject which can be used to define a hybrid search operation in a format largely similar to whatAggregateHybridSearchprovides.Under the hood,
redis-pyexpects three separate configuration objects to run hybrid search - one for the text+vector searches, one for the score combination, and one for post-processing of the results (e.g. aggregations, loading, limiting, etc.). TheHybridSearchclass manages the definition of all three.SearchIndex method
For
redis-py>=7.1.0, theSearchIndexandAsyncSearchIndexclasses define ahybrid_searchmethod that takes aHybridQueryinstance as an argument. Forredis-py<7.1.0the method raises aNotImplementedError. The method returns a list of retrieved dictionaries.Additional Changes
AggregateHybridQueryredis-py 7.0.0introduced a change to the type ofQuery._fields, changing it from a tuple to a list - a test had to be updated to differentiate the expectation based on the redis-py version.