Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 8 additions & 215 deletions evals/evaluation/rag_pilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,6 @@

RAG Pilot provides a set of tuners to optimize various parameters in a retrieval-augmented generation (RAG) pipeline. Each tuner allows fine-grained control over key aspects of parsing, chunking, postporcessing, and generating selection, enabling better retrieval and response generation.

### 🧠 Available Tuners

| Tuner | Stage | Function | Configuration |
|---|---|---|---|
| **EmbeddingTuner** | Retrieval | Tune embedding model and related parameters | Allows selection and configuration of the embedding model used for vectorization, including model name and optional parameters like dimension or backend. |
| **NodeParserTuner** | Retrieval | Tune node parser parameters | General tuner for configuring node parsers, possibly extending to custom strategies or pre-processing logic. |
| **SimpleNodeParserChunkTuner** | Retrieval | Tune `SentenceSplitter`'s `chunk_size` and `chunk_overlap` | Configures chunking behavior for document parsing by adjusting the size of individual text chunks and their overlap to ensure context retention. |
| **RetrievalTopkTuner** | Retrieval | Tune `top_k` for retriever | Adjusts how many documents are retrieved before reranking, balancing recall and performance. |
| **RerankerTopnTuner** | Postprocessing | Tune `top_n` for reranking | Adjusts the number of top-ranked documents returned after reranking, optimizing relevance and conciseness. |


These tuners help in optimizing document parsing, chunking strategies, reranking efficiency, and embedding selection for improved RAG performance.


## 🌐 Quickstart Guide

### ⚙️ Dependencies and Environment Setup
Expand All @@ -31,12 +17,14 @@ Load documents in EdgeCraftRAG before running RAG Pilot.
#### Setup RAG Pilot

```bash
cd rag_pilot

# (Optional) Build RAG Pilot and UI docker images
# Build RAG Pilot and UI docker images
cd rag_pilot
docker build --build-arg HTTP_PROXY=$HTTP_PROXY --build-arg HTTP_PROXYS=$HTTP_PROXYS --build-arg NO_PROXY=$NO_PROXY -t opea/ragpilot:latest -f ./Dockerfile .
docker build --build-arg HTTP_PROXY=$HTTP_PROXY --build-arg HTTP_PROXYS=$HTTP_PROXYS --build-arg NO_PROXY=$NO_PROXY -t opea/ragpilot-ui:latest -f ./ui/Dockerfile.ui .

# Or build docker images by build.sh
cd ./rag_pilot/docker_image_build
docker compose -f build.yaml build
# Setup ENV
export ECRAG_SERVICE_HOST_IP=${HOST_IP} # HOST IP of EC-RAG Service, usually current host ip

Expand All @@ -49,205 +37,10 @@ export ECRAG_SERVICE_HOST_IP=${HOST_IP} # HOST IP of EC-RAG Service, usually cur
# If you want to change exposed RAG Pilot service port
#export RAGPILOT_SERVICE_PORT=

#Start RAG Pilor server
cd ./rag_pilot
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
```

### 🚦 Launch RAG Pilot in Online Mode

To launch RAG Pilot, create the following *required files* before running the command:

#### 🔹Input file: QA List File (`your_queries.csv`)

The input CSV file should contain queries and associated ground truth data (optional) used for evaluation or tuning. Each row corresponds to a specific query and context file. The CSV must include the following **columns**:

| Column | Required | Description |
|--------|----------|-------------|
| `query_id` | ✅ Yes | Unique identifier for the query. Can be used to group multiple context entries under the same query. |
| `query` | ✅ Yes (at least one per `query_id`) | The actual query string. If left empty for some rows sharing the same `query_id`, the query from the first row with a non-empty value will be used. |
| `file_name` | ✅ Yes | The name of the file or document where the context (for retrieval or grounding) is drawn from. |
| `gt_context` | ✅ Yes | The ground truth context string that should be retrieved or matched against. |
| `ground_truth` | ❌ Optional | The ideal answer or response for the query, used for optional answer-level evaluation. |

##### 📌 CSV File Example

```csv
query_id,query,file_name,gt_context,ground_truth
53,故障来源有哪些?,故障处理记录表.txt,故障来源:用户投诉、日志系统、例行维护中发现、其它来源。,故障来源:用户投诉、日志系统、例行维护中发现、其它来源。
93,uMAC网元VNFC有哪几种备份方式,index.txt,ZUF-76-04-005 VNFC支持1+1主备冗余,uMAC网元VFNC有3中备份方式: 支持1+1主备冗余,支持N+M负荷分担冗余, 支持1+1互备冗余。
93,,index.txt,ZUF-76-04-006 VNFC支持N+M负荷分担冗余,
93,,index.txt,ZUF-76-04-008 VNFC支持1+1互备冗余,
```

#### ▶️ Run RAG Pilot

Run the following command to start the tuning process.

```bash
# Run pipeline tuning tool
export ECRAG_SERVICE_HOST_IP="ecrag_host_ip"
python3 -m run_pilot -q "your_queries.csv"
```

#### 📦 Output Files and Structure

Each tuning run in **RAG Pilot** generates a set of structured output files for analyzing and comparing different RAG pipeline configurations.

##### 📁 Directory Layout

- `rag_pilot_<timestamp>/`: Main folder for a tuning session.
- `summary.csv` – Overall performance metrics of all executed pipelines.
- `curr_pipeline.json` – Best pipeline configuration.
- `curr_rag_results.json` – Results of the best pipeline.
- `rag_summary.csv` – Query-wise summary.
- `rag_contexts.csv` – Detailed context analysis.
- `entry_<hash>/`: Subfolders for each tried pipeline with the same file structure:
- `pipeline.json`
- `rag_results.json`
- `rag_summary.csv`
- `rag_contexts.csv`

##### 🗂️ Output File Overview

| File Name | Description |
|----------------------|-----------------------------------------------------------------------------|
| `summary.csv` | Aggregated summary across all pipelines |
| `pipeline.json` | RAG pipeline configuration used in a specific trial |
| `rag_results.json` | List of results for each query, including metadata and context sets |
| `rag_summary.csv` | Summary of each query's outcome, including response and context hit counts |
| `rag_contexts.csv` | Breakdown of retrieved/reranked contexts and mapping to ground truth |

**Context Mapping Notes:**

- Contexts are categorized as `gt_contexts`, `retrieval_contexts`, or `postprocessing_contexts`.
- Mappings track which retrieved or postprocessed contexts hit the ground truth.
- Each context is associated with a `query_id` and indexed for traceability.


## 📴 Offline RAG Tuning

RAG Pilot supports offline mode using a RAG configuration file.

### ⚙️ Environment Setup

Refer to [Create Running Environment](#create-running-environment) in the Online RAG pipeline tuning section for setting up the environment before proceeding.

### 🚦 Launch RAG Pilot in Offline Mode

To be added in later release


## 🔧 How to Adjust RAG Pilot to Tune Your RAG Solution

### 🧩 What's Nodes and Modules

RAG Pilot represents each stage of the RAG pipeline as a **node**, such as `node_parser`, `indexer`, `retriever`, etc. Each node can have different **modules** that define its type and configuration. The nodes and modules are specified in a YAML file, allowing users to switch between different implementations easily.

Here is an example of nodes and modules for EdgeCraftRAG.

![RAG Pilot Architecture](RAG_Pilot.png)

### ⚙️ How to Configure Nodes and Modules

The available nodes and their modules are stored in a YAML file (i.e. `configs/ecrag.yaml` for EdgeCraftRAG as below). Each node can have multiple modules, and both nodes and modules have configurable parameters that can be tuned.

```yaml
nodes:
- node: node_parser
modules:
- module_type: simple
chunk_size: 400
chunk_overlap: 48
- module_type: hierarchical
chunk_sizes:
- 256
- 384
- 512
- node: indexer
embedding_model:
- BAAI/bge-small-zh-v1.5
- BAAI/bge-small-en-v1.5
modules:
- module_type: vector
- module_type: faiss_vector
- node: retriever
retrieve_topk: 30
modules:
- module_type: vectorsimilarity
- module_type: auto_merge
- module_type: bm25
- node: postprocessor
modules:
- module_type: reranker
top_n: 3
reranker_model: BAAI/bge-reranker-large
- module_type: metadata_replace
- node: generator
model:
- Qwen/Qwen2-7B-Instruct
inference_type:
- local
- vllm
prompt: null
```
1. **Each Node Can Have Multiple Modules**
- A node represents a stage in the RAG pipeline, such as `node_parser`, `indexer`, or `retriever`.
- Each node can support different modules that define how it operates. For example, the `node_parser` node can use either a `simple` or `hierarchical` module.

2. **Nodes Have Parameters to Tune**
- Some nodes have global parameters that affect all modules within them. For instance, the `retriever` node has a `retrieve_topk` parameter that defines how many top results are retrieved.

3. **Modules Have Parameters to Tune**
- Each module within a node can have its own parameters. For example, the `simple` parser module has `chunk_size` and `chunk_overlap` parameters, while the `hierarchical` parser module supports multiple `chunk_sizes`.

4. **Each Node Selects Its Module Based on a Type Map**
- The tool uses an internal mapping to associate each module type with its corresponding function. The type of module selected for each node is defined in a mapping system like the one below:

```python
COMP_TYPE_MAP = {
"node_parser": "parser_type",
"indexer": "indexer_type",
"retriever": "retriever_type",
"postprocessor": "processor_type",
"generator": "inference_type",
}
```

### 🧑‍💻 How to Use Nodes and Modules

Besides the YAML configuration file, the tool also uses a module map to associate each module with a runnable instance. This ensures that the tool correctly links each module type to its respective function within the pipeline.

#### 🧾 Example: Mapping Modules to Functions
The function below defines how different module types are mapped to their respective components in EdgeCraftRAG:

```python
def get_ecrag_module_map(ecrag_pl):
ecrag_modules = {
# root
"root": (ecrag_pl, ""),
# node_parser
"node_parser": (ecrag_pl, "node_parser"),
"simple": (ecrag_pl, "node_parser"),
"hierarchical": (ecrag_pl, "node_parser"),
"sentencewindow": (ecrag_pl, "node_parser"),
# indexer
"indexer": (ecrag_pl, "indexer"),
"vector": (ecrag_pl, "indexer"),
"faiss_vector": (ecrag_pl, "indexer"),
# retriever
"retriever": (ecrag_pl, "retriever"),
"vectorsimilarity": (ecrag_pl, "retriever"),
"auto_merge": (ecrag_pl, "retriever"),
"bm25": (ecrag_pl, "retriever"),
# postprocessor
"postprocessor": (ecrag_pl, "postprocessor[0]"),
"reranker": (ecrag_pl, "postprocessor[0]"),
"metadata_replace": (ecrag_pl, "postprocessor[0]"),
# generator
"generator": (ecrag_pl, "generator"),
}
return ecrag_modules
```


By modifying the YAML configuration file and understanding how modules are mapped to functions, you can experiment with different configurations and parameter settings to optimize their RAG pipeline effectively.
About how to use RAG Pilot and more details, please refer to this [doc](./docs/Detail_Guide.md) for details.
2 changes: 1 addition & 1 deletion evals/evaluation/rag_pilot/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
25.05-dev
25.07-dev
17 changes: 17 additions & 0 deletions evals/evaluation/rag_pilot/api/v1/pilot.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,23 @@ async def run_pipeline_by_id(id: uuid.UUID):
return f"Error: Pipeline {id} does not exist"


@pilot_app.post(path="/v1/pilot/pipeline/restore")
async def restore_pipeline():
success = pilot.restore_curr_pl()
if success:
current_pl = pilot.get_curr_pl()
return {
"message": "Pipeline restored successfully",
"pipeline_id": str(current_pl.get_id()) if current_pl else None,
"restored_from": "EdgeCraftRAG active pipeline",
}
else:
raise HTTPException(
status_code=404,
detail="Failed to restore pipeline: No active pipeline found in EdgeCraftRAG service or restore operation failed",
)


@pilot_app.post(path="/v1/pilot/files")
async def add_files(request: DataIn):
ret = upload_files(request)
Expand Down
15 changes: 15 additions & 0 deletions evals/evaluation/rag_pilot/api/v1/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,21 @@ async def get_stage_pipelines(stage: RAGStage = Path(...)):
return pipeline_list


@tuner_app.get(path="/v1/tuners/stage/{stage}/pipelines/best/id")
async def get_stage_pipelines_best(stage: RAGStage = Path(...)):
tuner_names = tunerMgr.get_tuners_by_stage(stage)
pl_id_list = []
for tuner_name in tuner_names:
record = tunerMgr.get_tuner_record(tuner_name)
if record is not None:
pl_id_list.extend(list(record.all_pipeline_ids))
if record.base_pipeline_id not in pl_id_list:
pl_id_list.append(record.base_pipeline_id)
pl_id_list = list(set(pl_id_list))
best_pl_id = get_best_pl_id(pl_id_list, stage)
return best_pl_id


@tuner_app.get(path="/v1/tuners/{tuner_name}/pipelines/best")
async def get_pipeline_best(tuner_name):
record = tunerMgr.get_tuner_record(tuner_name)
Expand Down
16 changes: 11 additions & 5 deletions evals/evaluation/rag_pilot/components/pilot/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,6 @@ def get_prompt(self) -> Optional[str]:
def export_pipeline(self):
self._restore_model_instances()
exported_pl = copy.deepcopy(self.pl)
if hasattr(exported_pl, "generator") and exported_pl.generator:
if hasattr(exported_pl.generator, "prompt_content"):
delattr(exported_pl.generator, "prompt_content")
self._replace_model_with_id()
return exported_pl

Expand Down Expand Up @@ -372,7 +369,16 @@ class RAGResults(BaseModel):
finished: bool = False

def add_result(self, result):
self.results.append(result)
# if result.query_id has appear in self.results, then update the result,else append the results
updated_existing = False
if result.query_id is not None:
for idx, r in enumerate(self.results):
if r.query_id == result.query_id:
self.results[idx] = result
updated_existing = True
break
if not updated_existing:
self.results.append(result)
self.cal_metadata()

def cal_recall(self):
Expand Down Expand Up @@ -418,7 +424,7 @@ def get_metrics(self):
return self.metadata or {}

def get_metric(self, metric: Metrics, default=float("-inf")):
return self.metadata.get(metric.value, default)
return (self.metadata or {}).get(metric.value, default)

def update_result_metrics(self, query_id: int, metrics: Dict[str, Union[float, int]]):
updated = False
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ class NodeParserIn(BaseModel):
class IndexerIn(BaseModel):
indexer_type: str
embedding_model: Optional[ModelIn] = None
vector_uri: Optional[str] = None


class RetrieverIn(BaseModel):
Expand All @@ -43,6 +44,7 @@ class GeneratorIn(BaseModel):
prompt_content: Optional[str] = None
model: Optional[ModelIn] = None
inference_type: Optional[str] = "local"
vllm_endpoint: Optional[str] = None


class PipelineCreateIn(BaseModel):
Expand Down Expand Up @@ -78,3 +80,7 @@ class KnowledgeBaseCreateIn(BaseModel):
name: str
description: Optional[str] = None
active: Optional[bool] = None


class MilvusConnectRequest(BaseModel):
vector_uri: str
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class IndexerType(str, Enum):

FAISS_VECTOR = "faiss_vector"
DEFAULT_VECTOR = "vector"
MILVUS_VECTOR = "milvus_vector"


class RetrieverType(str, Enum):
Expand Down
11 changes: 11 additions & 0 deletions evals/evaluation/rag_pilot/components/pilot/pilot.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,17 @@ def get_curr_pl(self):
else:
return None

def restore_curr_pl(self):
pilot.curr_pl_id = None
pl_raw = get_active_pipeline()
if pl_raw:
active_pl = RAGPipeline(pl_raw)
active_pl.regenerate_id()
pilot.add_rag_pipeline(active_pl)
return self.rag_pipeline_dict[self.curr_pl_id]
else:
return None

def get_curr_pl_id(self):
if self.curr_pl_id:
return self.curr_pl_id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# SPDX-License-Identifier: Apache-2.0

services:
ui:
ragpilot-ui:
image: ${REGISTRY:-opea}/ragpilot-ui:${TAG:-latest}
container_name: ragpilot-ui
environment:
Expand Down
Loading
Loading