Skip to content

Conversation

@franciscojavierarceo
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo commented Nov 17, 2025

What does this PR do?

Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig.

Makes rewrite_query parameter functional in vector store search.

  • rewrite_query=false (default): Use original query
  • rewrite_query=true: Expand query via LLM, or fail gracefully if no LLM available

Adds 4 parameters toVectorStoresConfig:

  • default_query_expansion_model: LLM model for query expansion (optional)
  • query_expansion_prompt: Custom prompt template (optional, uses built-in default)
  • query_expansion_max_tokens: Configurable token limit (default: 100)
  • query_expansion_temperature: Configurable temperature (default: 0.3)

Enabled run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "ollama"
        model_id: "llama3.2:3b-instruct-fp16"
      # prompt defaults to built-in
      # max_tokens defaults to 100
      # temperature defaults to 0.3

Fully customized run.yaml:

  vector_stores:
    default_provider_id: faiss
    default_embedding_model:
      provider_id: sentence-transformers
      model_id: nomic-ai/nomic-embed-text-v1.5
    rewrite_query_params:
      model:
        provider_id: ollama
        model_id: llama3.2:3b-instruct-fp16
      prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
      max_tokens: 100
      temperature: 0.3

Test Plan

Added test and recording

Example script as well:

import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO

def gen_file(client, text: str=""):
    file_buffer = BytesIO(text.encode('utf-8'))
    file_buffer.name = "my_file.txt"

    uploaded_file = client.files.create(
        file=file_buffer,
        purpose="assistants"
    )
    return uploaded_file

async def test_query_rewriting():
    client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
    uploaded_file = gen_file(client, "banana banana apple")
    uploaded_file2 = gen_file(client, "orange orange kiwi")

    vs = client.vector_stores.create()
    xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
    xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
    response1 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="apple",
                max_num_results=3,
                rewrite_query=False
            )
    response2 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="kiwi",
                max_num_results=3,
                rewrite_query=True,
            )

    print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
    print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")

    for f in [uploaded_file.id, uploaded_file2.id]:
        client.files.delete(file_id=f)
    client.vector_stores.delete(vector_store_id=vs.id)

if __name__ == "__main__":
    asyncio.run(test_query_rewriting())

And see the screen shot of the server logs showing it worked.
Screenshot 2025-11-19 at 1 16 03 PM

Notice the log:

 Query rewritten:
         'kiwi''kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'

So kiwi was expanded.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025
@franciscojavierarceo franciscojavierarceo force-pushed the filesearch-rewrite-query branch 5 times, most recently from 83cece1 to 5349c33 Compare November 18, 2025 01:46
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review November 18, 2025 01:51
llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]

# Filter out models that are known to be embedding models (misclassified as LLM)
embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this and provider_priority below

@mattf
Copy link
Collaborator

mattf commented Nov 18, 2025

@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

this would make the behavior somewhat stable.

the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.

   ...
   query_rewriter:
      model: ollama/llama6-magic
      prompt: "do your thing on {query} and be magical"
   ...

llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]

# Filter out models that are known to be embedding models (misclassified as LLM)
embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?

Copy link
Collaborator Author

@franciscojavierarceo franciscojavierarceo Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.

@franciscojavierarceo
Copy link
Collaborator Author

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon.

@mattf
Copy link
Collaborator

mattf commented Nov 19, 2025

@franciscojavierarceo please update the description with the new proposed config and user interaction

@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 19, 2025
@mergify mergify bot removed the needs-rebase label Nov 19, 2025
@franciscojavierarceo franciscojavierarceo changed the title feat: Actualize query rewrite in search API feat: Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig Nov 19, 2025
@franciscojavierarceo franciscojavierarceo force-pushed the filesearch-rewrite-query branch 2 times, most recently from e88cb61 to 869888d Compare December 3, 2025 21:32
@franciscojavierarceo
Copy link
Collaborator Author

@mattf mind taking a look?

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good!

or not self.vector_stores_config.rewrite_query_params.model
):
raise ValueError(
"Query rewriting requested but not configured. Please configure rewrite_query_params.model in vector_stores config."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be two messages -

  • logging.warn("User is trying to use vector_store query rewriting, but it isn't configured. Please ...")
  • ValueError("Query rewriting is not available...")

# Use custom prompt from config if provided, otherwise use built-in default
# Users only need to configure the model - prompt is automatic with optional override
custom_prompt = self.vector_stores_config.rewrite_query_params.prompt
if custom_prompt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a default set for prompt. how will this be false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3,
)

response = await self.inference_api.openai_chat_completion(request) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why type ignore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid adding the inference_api to the init() of all of the adapters.

since you wanted to modifying the adapters I thought this was a reasonable compromise. LMK if you'd prefer I add them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oic you setting inference_api for the tests. where is it getting set for a non-test run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gets set during the provider instantiation using the get_provider_impl(), IMO the ideal thing to do is just add it to the init but I can clean that up in a follow up PR or a separate one if you'd like.

response = await self.inference_api.openai_chat_completion(request) # type: ignore
content = response.choices[0].message.content
if content is None:
raise ValueError("LLM response content is None - cannot rewrite query")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an error in the log for the admin and a generic 500 to the user about query_rewrite failing

if content is None:
raise ValueError("LLM response content is None - cannot rewrite query")
rewritten_query: str = content.strip()
logger.debug(f"Query rewritten: '{query}' → '{rewritten_query}'")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't log user input

temperature=self.vector_stores_config.rewrite_query_params.temperature or 0.3,
)

response = await self.inference_api.openai_chat_completion(request) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai_chat_completion can throw exceptions. they may include config details of the service that should not be exposed to users, e.g. the model being used, api credentials. a safer approach is to catch and log the detailed exception then send the user a 500 about query_rewrite failing.

@franciscojavierarceo
Copy link
Collaborator Author

@mattf updated again, thanks for the feedback!

@franciscojavierarceo franciscojavierarceo force-pushed the filesearch-rewrite-query branch 2 times, most recently from 276b482 to 297ea21 Compare December 6, 2025 02:46
@mergify
Copy link

mergify bot commented Dec 6, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)

ptal. please revert the commits if you don't like this path in any way.

max_num_results=max_num_results,
ranking_options=ranking_options,
rewrite_query=rewrite_query,
rewrite_query=False, # Already handled at router level
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with handling this at the router level and probably there's no way to avoid this but i at least want to state for the record that someone outside of us looking at this code in isolation may result in confusion...but maybe it'll just be an LLM that pulls the router into the context. 🤷 🥲

@franciscojavierarceo
Copy link
Collaborator Author

@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)

@mattf I had one small comment but I'm good with this. LMK if there's anything else you'd like to see.

franciscojavierarceo and others added 14 commits December 10, 2025 09:31
Signed-off-by: Francisco Javier Arceo <[email protected]>

adding query expansion model to vector store config

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo franciscojavierarceo merged commit 95b2948 into llamastack:main Dec 10, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants