-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add support for query rewrite in vector_store.search #4171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for query rewrite in vector_store.search #4171
Conversation
83cece1 to
5349c33
Compare
| llm_models = [m for m in models_response.data if m.model_type == ModelType.llm] | ||
|
|
||
| # Filter out models that are known to be embedding models (misclassified as LLM) | ||
| embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this and provider_priority below
|
@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about having the vector store config specify the rewrite model and the request is rejected if none is configured?
this would make the behavior somewhat stable.
the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.
...
query_rewriter:
model: ollama/llama6-magic
prompt: "do your thing on {query} and be magical"
...
| llm_models = [m for m in models_response.data if m.model_type == ModelType.llm] | ||
|
|
||
| # Filter out models that are known to be embedding models (misclassified as LLM) | ||
| embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.
Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon. |
859f4c2 to
1c93410
Compare
|
@franciscojavierarceo please update the description with the new proposed config and user interaction |
|
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
685eb05 to
d935650
Compare
default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig
e88cb61 to
869888d
Compare
|
@mattf mind taking a look? |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
| or not self.vector_stores_config.rewrite_query_params.model | ||
| ): | ||
| raise ValueError( | ||
| "Query rewriting requested but not configured. Please configure rewrite_query_params.model in vector_stores config." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be two messages -
logging.warn("User is trying to use vector_store query rewriting, but it isn't configured. Please ...")ValueError("Query rewriting is not available...")
| # Use custom prompt from config if provided, otherwise use built-in default | ||
| # Users only need to configure the model - prompt is automatic with optional override | ||
| custom_prompt = self.vector_stores_config.rewrite_query_params.prompt | ||
| if custom_prompt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a default set for prompt. how will this be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
| temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3, | ||
| ) | ||
|
|
||
| response = await self.inference_api.openai_chat_completion(request) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why type ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid adding the inference_api to the init() of all of the adapters.
since you wanted to modifying the adapters I thought this was a reasonable compromise. LMK if you'd prefer I add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oic you setting inference_api for the tests. where is it getting set for a non-test run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it gets set during the provider instantiation using the get_provider_impl(), IMO the ideal thing to do is just add it to the init but I can clean that up in a follow up PR or a separate one if you'd like.
| response = await self.inference_api.openai_chat_completion(request) # type: ignore | ||
| content = response.choices[0].message.content | ||
| if content is None: | ||
| raise ValueError("LLM response content is None - cannot rewrite query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an error in the log for the admin and a generic 500 to the user about query_rewrite failing
| if content is None: | ||
| raise ValueError("LLM response content is None - cannot rewrite query") | ||
| rewritten_query: str = content.strip() | ||
| logger.debug(f"Query rewritten: '{query}' → '{rewritten_query}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't log user input
| temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3, | ||
| ) | ||
|
|
||
| response = await self.inference_api.openai_chat_completion(request) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openai_chat_completion can throw exceptions. they may include config details of the service that should not be exposed to users, e.g. the model being used, api credentials. a safer approach is to catch and log the detailed exception then send the user a 500 about query_rewrite failing.
|
@mattf updated again, thanks for the feedback! |
276b482 to
297ea21
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)
ptal. please revert the commits if you don't like this path in any way.
| max_num_results=max_num_results, | ||
| ranking_options=ranking_options, | ||
| rewrite_query=rewrite_query, | ||
| rewrite_query=False, # Already handled at router level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with handling this at the router level and probably there's no way to avoid this but i at least want to state for the record that someone outside of us looking at this code in isolation may result in confusion...but maybe it'll just be an LLM that pulls the router into the context. 🤷 🥲
@mattf I had one small comment but I'm good with this. LMK if there's anything else you'd like to see. |
Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
…ewriting Signed-off-by: Francisco Javier Arceo <[email protected]>
…jection Signed-off-by: Francisco Javier Arceo <[email protected]>
d288c88 to
b52f2db
Compare
What does this PR do?
Actualize query rewrite in search API, add
default_query_expansion_modelandquery_expansion_promptinVectorStoresConfig.Makes
rewrite_queryparameter functional in vector store search.rewrite_query=false(default): Use original queryrewrite_query=true: Expand query via LLM, or fail gracefully if no LLM availableAdds 4 parameters to
VectorStoresConfig:default_query_expansion_model: LLM model for query expansion (optional)query_expansion_prompt: Custom prompt template (optional, uses built-in default)query_expansion_max_tokens: Configurable token limit (default: 100)query_expansion_temperature: Configurable temperature (default: 0.3)Enabled
run.yaml:Fully customized
run.yaml:Test Plan
Added test and recording
Example script as well:
And see the screen shot of the server logs showing it worked.

Notice the log:
Query rewritten: 'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'So
kiwiwas expanded.