feat: Add support for query rewrite in vector_store.search #4171

franciscojavierarceo · 2025-11-17T04:57:19Z

What does this PR do?

Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig.

Makes rewrite_query parameter functional in vector store search.

rewrite_query=false (default): Use original query
rewrite_query=true: Expand query via LLM, or fail gracefully if no LLM available

Adds 4 parameters toVectorStoresConfig:

default_query_expansion_model: LLM model for query expansion (optional)
query_expansion_prompt: Custom prompt template (optional, uses built-in default)
query_expansion_max_tokens: Configurable token limit (default: 100)
query_expansion_temperature: Configurable temperature (default: 0.3)

Enabled run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "ollama"
        model_id: "llama3.2:3b-instruct-fp16"
      # prompt defaults to built-in
      # max_tokens defaults to 100
      # temperature defaults to 0.3

Fully customized run.yaml:

  vector_stores:
    default_provider_id: faiss
    default_embedding_model:
      provider_id: sentence-transformers
      model_id: nomic-ai/nomic-embed-text-v1.5
    rewrite_query_params:
      model:
        provider_id: ollama
        model_id: llama3.2:3b-instruct-fp16
      prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
      max_tokens: 100
      temperature: 0.3

Test Plan

Added test and recording

Example script as well:

import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO

def gen_file(client, text: str=""):
    file_buffer = BytesIO(text.encode('utf-8'))
    file_buffer.name = "my_file.txt"

    uploaded_file = client.files.create(
        file=file_buffer,
        purpose="assistants"
    )
    return uploaded_file

async def test_query_rewriting():
    client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
    uploaded_file = gen_file(client, "banana banana apple")
    uploaded_file2 = gen_file(client, "orange orange kiwi")

    vs = client.vector_stores.create()
    xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
    xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
    response1 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="apple",
                max_num_results=3,
                rewrite_query=False
            )
    response2 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="kiwi",
                max_num_results=3,
                rewrite_query=True,
            )

    print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
    print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")

    for f in [uploaded_file.id, uploaded_file2.id]:
        client.files.delete(file_id=f)
    client.vector_stores.delete(vector_store_id=vs.id)

if __name__ == "__main__":
    asyncio.run(test_query_rewriting())

And see the screen shot of the server logs showing it worked.

Notice the log:

 Query rewritten:
         'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'

So kiwi was expanded.

franciscojavierarceo · 2025-11-18T01:53:57Z

src/llama_stack/providers/utils/memory/vector_store.py

+        llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]
+
+        # Filter out models that are known to be embedding models (misclassified as LLM)
+        embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]


removing this and provider_priority below

src/llama_stack/providers/utils/memory/vector_store.py

mattf · 2025-11-18T13:48:51Z

@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice

mattf

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

this would make the behavior somewhat stable.

the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.

   ...
   query_rewriter:
      model: ollama/llama6-magic
      prompt: "do your thing on {query} and be magical"
   ...

raghotham · 2025-11-18T17:05:15Z

src/llama_stack/providers/utils/memory/vector_store.py

+        llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]
+
+        # Filter out models that are known to be embedding models (misclassified as LLM)
+        embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]


instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?

Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.

franciscojavierarceo · 2025-11-18T17:44:49Z

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon.

mattf · 2025-11-19T13:39:05Z

@franciscojavierarceo please update the description with the new proposed config and user interaction

mergify · 2025-11-19T15:25:05Z

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

franciscojavierarceo · 2025-12-05T11:48:27Z

@mattf mind taking a look?

mattf

looking good!

mattf · 2025-12-05T11:31:09Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+            or not self.vector_stores_config.rewrite_query_params.model
+        ):
+            raise ValueError(
+                "Query rewriting requested but not configured. Please configure rewrite_query_params.model in vector_stores config."


there should be two messages -

logging.warn("User is trying to use vector_store query rewriting, but it isn't configured. Please ...")

ValueError("Query rewriting is not available...")

mattf · 2025-12-05T11:34:11Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+        # Use custom prompt from config if provided, otherwise use built-in default
+        # Users only need to configure the model - prompt is automatic with optional override
+        custom_prompt = self.vector_stores_config.rewrite_query_params.prompt
+        if custom_prompt:


there's a default set for prompt. how will this be false?

will remove

mattf · 2025-12-05T11:51:51Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+            temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3,
+        )
+
+        response = await self.inference_api.openai_chat_completion(request)  # type: ignore


why type ignore?

to avoid adding the inference_api to the init() of all of the adapters.

since you wanted to modifying the adapters I thought this was a reasonable compromise. LMK if you'd prefer I add them.

oic you setting inference_api for the tests. where is it getting set for a non-test run?

it gets set during the provider instantiation using the get_provider_impl(), IMO the ideal thing to do is just add it to the init but I can clean that up in a follow up PR or a separate one if you'd like.

mattf · 2025-12-05T11:53:01Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+        response = await self.inference_api.openai_chat_completion(request)  # type: ignore
+        content = response.choices[0].message.content
+        if content is None:
+            raise ValueError("LLM response content is None - cannot rewrite query")


this should be an error in the log for the admin and a generic 500 to the user about query_rewrite failing

mattf · 2025-12-05T11:53:59Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+        if content is None:
+            raise ValueError("LLM response content is None - cannot rewrite query")
+        rewritten_query: str = content.strip()
+        logger.debug(f"Query rewritten: '{query}' → '{rewritten_query}'")


we shouldn't log user input

mattf · 2025-12-05T11:57:04Z

src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

+            temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3,
+        )
+
+        response = await self.inference_api.openai_chat_completion(request)  # type: ignore


openai_chat_completion can throw exceptions. they may include config details of the service that should not be exposed to users, e.g. the model being used, api credentials. a safer approach is to catch and log the detailed exception then send the user a 500 about query_rewrite failing.

franciscojavierarceo · 2025-12-06T02:13:22Z

@mattf updated again, thanks for the feedback!

mergify · 2025-12-06T02:46:14Z

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mattf

@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)

ptal. please revert the commits if you don't like this path in any way.

franciscojavierarceo · 2025-12-06T20:00:37Z

src/llama_stack/core/routers/vector_io.py

            max_num_results=max_num_results,
            ranking_options=ranking_options,
-            rewrite_query=rewrite_query,
+            rewrite_query=False,  # Already handled at router level


i agree with handling this at the router level and probably there's no way to avoid this but i at least want to state for the record that someone outside of us looking at this code in isolation may result in confusion...but maybe it'll just be an LLM that pulls the router into the context. 🤷 🥲

franciscojavierarceo · 2025-12-06T20:01:22Z

@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)

@mattf I had one small comment but I'm good with this. LMK if there's anything else you'd like to see.

Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>

Signed-off-by: Francisco Javier Arceo <[email protected]>

…ewriting Signed-off-by: Francisco Javier Arceo <[email protected]>

…jection Signed-off-by: Francisco Javier Arceo <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch 5 times, most recently from 83cece1 to 5349c33 Compare November 18, 2025 01:46

franciscojavierarceo marked this pull request as ready for review November 18, 2025 01:51

franciscojavierarceo requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1 and yanxi0830 as code owners November 18, 2025 01:51

franciscojavierarceo commented Nov 18, 2025

View reviewed changes

src/llama_stack/providers/utils/memory/vector_store.py Outdated Show resolved Hide resolved

mattf reviewed Nov 18, 2025

View reviewed changes

raghotham reviewed Nov 18, 2025

View reviewed changes

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from 859f4c2 to 1c93410 Compare November 18, 2025 18:31

mergify bot added the needs-rebase label Nov 19, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from 685eb05 to d935650 Compare November 19, 2025 15:41

mergify bot removed the needs-rebase label Nov 19, 2025

franciscojavierarceo changed the title ~~feat: Actualize query rewrite in search API~~ feat: Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig Nov 19, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch 2 times, most recently from e88cb61 to 869888d Compare December 3, 2025 21:32

mattf requested changes Dec 5, 2025

View reviewed changes

franciscojavierarceo force-pushed the filesearch-rewrite-query branch 2 times, most recently from 276b482 to 297ea21 Compare December 6, 2025 02:46

mergify bot added needs-rebase and removed needs-rebase labels Dec 6, 2025

mattf reviewed Dec 6, 2025

View reviewed changes

franciscojavierarceo commented Dec 6, 2025

View reviewed changes

mattf approved these changes Dec 8, 2025

View reviewed changes

franciscojavierarceo and others added 14 commits December 10, 2025 09:31

feat: Actualize query rewrite in search API

ada5022

Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>

adding config to providers so that it can properly be used

b134ce7

Signed-off-by: Francisco Javier Arceo <[email protected]>

added quey expnasion model to extra_body

1432ea2

Signed-off-by: Francisco Javier Arceo <[email protected]>

refactor to only configuration of model at build time

5f09274

Signed-off-by: Francisco Javier Arceo <[email protected]>

renaming to query_rewrite, consolidating, and cleaning up validation

621e67f

Signed-off-by: Francisco Javier Arceo <[email protected]>

undoing formatting and updating missed expansion parameterS

6135074

Signed-off-by: Francisco Javier Arceo <[email protected]>

raise when querying without config

8c47271

Signed-off-by: Francisco Javier Arceo <[email protected]>

removing adapters and changing only in mixin

fe7d08c

Signed-off-by: Francisco Javier Arceo <[email protected]>

incorporate matt's feedback

509eb60

Signed-off-by: Francisco Javier Arceo <[email protected]>

add log and update test

4d6adf6

Signed-off-by: Francisco Javier Arceo <[email protected]>

move rewrite prompt validation to stack startup

9ca3ef0

move query rewrite handling into vector stores router

7b1c54b

Delegate vector store search from routing table to router for query r…

d964963

…ewriting Signed-off-by: Francisco Javier Arceo <[email protected]>

remove redundant vector_stores_config assignment after constructor in…

b52f2db

…jection Signed-off-by: Francisco Javier Arceo <[email protected]>

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from d288c88 to b52f2db Compare December 10, 2025 14:38

franciscojavierarceo merged commit 95b2948 into llamastack:main Dec 10, 2025
38 checks passed

feat: Add support for query rewrite in vector_store.search #4171

feat: Add support for query rewrite in vector_store.search #4171

Uh oh!

Conversation

franciscojavierarceo commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattf commented Nov 18, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo commented Nov 18, 2025

Uh oh!

mattf commented Nov 19, 2025

Uh oh!

mergify bot commented Nov 19, 2025

Uh oh!

franciscojavierarceo commented Dec 5, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo commented Dec 6, 2025

Uh oh!

mergify bot commented Dec 6, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

franciscojavierarceo commented Nov 17, 2025 •

edited

Loading

franciscojavierarceo Nov 18, 2025 •

edited

Loading