The standard data pipeline is illustrated by the diagram.
The explanation of the steps in the pipeline follows.
- Minipilot leverages RedisVL to introduce the semantic cache to the project. The vectorizer is the OpenAI's embedding model
text-embedding-ada-002
. - The history is collected from Redis. It is stored as a Redis list keyed by the user session identifier.
- The conversation history and the last question are condensed into a standalone question, which retrieves a conversation-aware context from Redis with a vector search.
- The context is collected with a range vector search based on the desired threshold and the number of results. Both are configurable (search for
MINIPILOT_CONTEXT_LENGTH
andMINIPILOT_CONTEXT_RELEVANCE_SCORE
) - The history, the context, and the question are assembled in the prompt. The system prompt, together with the assembled prompts, is passed to the LLM. The whole operation is done using LangChain’s ConversationalRetrievalChain API.
- The answer is streamed back to the user.
- The question and answer pair is added to the cache if relevant (a basic criterion is represented by the context. If there was at least a result while fetching the context for RAG, then the question is pertinent, and the answer can be saved).
- The interaction is added to the conversation history.
Regarding the interactions with OpenAI, Minipilot uses the chat completion api via the LangChain ChatOpenAI class as well as the text-embedding-ada-002
embedding model for the retrievers and the semantic cache.