-
Instead of using the Pandas Query pipeline qp.run, I am using query_engine = RAGStringQueryEngine as shown below. I hope I am utilizing all the components of qp.run or Pandas query pipeline. However, I am wondering if it is possible to use the components from qp.run directly within my RAGStringQueryEngine. It seems like I might be missing components such as add_link or llm2. Do you have any suggestions on how to integrate these components in a meaningful way? https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_pandas/ (reference to Pandas query pipeline) My Custom Query Pipe line is below: Request for suggestion to include all pandas pipeline components:
This is Pandas query pipe line from above link:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
To integrate the missing components such as
Here is a code snippet that demonstrates this process: from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.query_pipeline import QueryPipeline, PromptTemplate
from llama_index.llms.openai import OpenAI
# Define modules
prompt_str = "Please generate a question about Paul Graham's life regarding the following topic {topic}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(similarity_top_k=3)
reranker = CohereRerank()
summarizer = TreeSummarize(llm=llm)
# Define query pipeline
p = QueryPipeline(verbose=True)
p.add_modules(
{
"llm": llm,
"prompt_tmpl": prompt_tmpl,
"retriever": retriever,
"summarizer": summarizer,
"reranker": reranker,
}
)
# Add links between modules
p.add_link("input", "prompt_tmpl")
p.add_link("prompt_tmpl", "llm")
p.add_link("llm", "retriever")
p.add_link("retriever", "reranker", dest_key="nodes")
p.add_link("llm", "reranker", dest_key="query_str")
p.add_link("reranker", "summarizer", dest_key="nodes")
p.add_link("llm", "summarizer", dest_key="query_str")
# Run the pipeline
response = p.run(topic="YC")
print(str(response)) This code sets up a full RAG pipeline with query rewriting, retrieval, reranking, and response synthesis, and integrates the missing components using the Additionally, you can refer to the class QueryPipelineQueryEngine(CustomQueryEngine):
query_pipeline: QueryPipeline = Field(
description="Query Pipeline to use for Q&A.",
)
def custom_query(self, query_str: str) -> RESPONSE_TYPE:
return self.query_pipeline.run(query_str=query_str)
async def acustom_query(self, query_str: str) -> RESPONSE_TYPE:
return await self.query_pipeline.arun(query_str=query_str) This class demonstrates how to integrate a For a more detailed example, you can refer to the RAG Fusion Query Pipeline setup in the LlamaIndex documentation. Here is a snippet from the example: # construct query pipeline
p = QueryPipeline()
module_dict = {
**self.retrievers,
"input": InputComponent(),
"summarizer": TreeSummarize(),
"join": ArgPackComponent(),
"reranker": rerank_component,
}
p.add_modules(module_dict)
# add links from input to retriever (id'ed by chunk_size)
for chunk_size in self.chunk_sizes:
p.add_link("input", str(chunk_size))
p.add_link(str(chunk_size), "join", dest_key=str(chunk_size))
p.add_link("join", "reranker")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("reranker", "summarizer", dest_key="nodes") This example shows how to set up a query pipeline with multiple retrievers, a join component, and a reranker, and how to link these components together. You can visualize the DAG using the |
Beta Was this translation helpful? Give feedback.
-
@dosu So, the pipeline and query engine are essentially based on the same concept and can use the same components, correct? In the pipeline, add_link connects the modules, whereas in the query engine, all modules perform their tasks without explicit connections. What is the real advantage of using a pipeline then? Am I thinking right? Please help with clarification. |
Beta Was this translation helpful? Give feedback.
Yes, you are correct that both the pipeline and the query engine can use the same components. The key difference lies in how these components are connected and managed.
Pipeline:
add_link
method is used to explicitly connect modules, allowing for complex workflows and dependencies to be managed within the pipeline.