-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Currently, RedisSemanticCache and other cache integrations in LangChain operate at the LLM layer — caching on the rendered prompt (question + retrieved context).
For many RAG use cases, it would be very useful to cache before retrieval, i.e., at the question level, so that repeated or semantically similar questions can bypass vector DB retrieval entirely.
Why it matters:
• Saves cost and latency (skip retrieval + LLM if cached).
• Supports both exact-match and semantic caching of questions.
• Fits common enterprise use cases where the same queries recur frequently.
What I’m asking:
Could LangChain provide a QuestionCache or extend RedisSemanticCache (and similar backends) with an option to store & lookup only on the raw user question (optionally semantic), before retrieval?
This would complement the existing LLM-level cache and make caching more flexible in RAG pipelines.