diff --git a/.github/code_spell_ignore.txt b/.github/code_spell_ignore.txt index 16ab788f..52d7e17a 100644 --- a/.github/code_spell_ignore.txt +++ b/.github/code_spell_ignore.txt @@ -1,3 +1,5 @@ rouge Rouge ROUGE +ModelIn +modelin diff --git a/evals/evaluation/rag_pilot/RAG_Pilot.png b/evals/evaluation/rag_pilot/RAG_Pilot.png new file mode 100644 index 00000000..61ece95f Binary files /dev/null and b/evals/evaluation/rag_pilot/RAG_Pilot.png differ diff --git a/evals/evaluation/rag_pilot/README.md b/evals/evaluation/rag_pilot/README.md new file mode 100644 index 00000000..9399a31d --- /dev/null +++ b/evals/evaluation/rag_pilot/README.md @@ -0,0 +1,209 @@ + +# RAG Pilot - A RAG Pipeline Tuning Tool + +## Overview + +RAG Pilot provides a set of tuners to optimize various parameters in a retrieval-augmented generation (RAG) pipeline. Each tuner allows fine-grained control over key aspects of parsing, chunking, postporcessing, and generating selection, enabling better retrieval and response generation. + +### Available Tuners + +| Tuner | Function | Configuration | +|---|---|---| +| **NodeParserTypeTuner** | Switch between `simple` and `hierarchical` node parsers | The `simple` parser splits text into basic chunks using [`SentenceSplitter`](https://docs.llamaindex.ai/en/stable/api_reference/node_parsers/sentence_splitter/), while the `hierarchical` parser ([`HierarchicalNodeParser`](https://docs.llamaindex.ai/en/v0.10.17/api/llama_index.core.node_parser.HierarchicalNodeParser.html)) creates a structured hierarchy of nodes to maintain contextual relationships. | +| **SimpleNodeParserChunkTuner** | Tune `SentenceSplitter`'s `chunk_size` and `chunk_overlap` | Configures chunking behavior for document parsing by adjusting the size of individual text chunks and their overlap to ensure context retention. | +| **RerankerTopnTuner** | Tune `top_n` for reranking | Adjusts the number of top-ranked documents retrieved, optimizing the relevance of retrieved results. | +| **EmbeddingLanguageTuner** | Select the embedding model | Configures the embedding model for retrieval, allowing users to select different models for vector representation. | + +These tuners help in optimizing document parsing, chunking strategies, reranking efficiency, and embedding selection for improved RAG performance. + + +## Online RAG Tuning + +### Dependencies and Environment Setup + +#### Setup EdgeCraftRAG + +Setup EdgeCraftRAG pipeline based on this [link](https://github.com/opea-project/GenAIExamples/tree/main/EdgeCraftRAG). + +Load documents in EdgeCraftRAG before running RAG Pilot. + +#### Create Running Environment + +```bash +# Create a virtual environment +python3 -m venv tuning +source tuning/bin/activate + +# Install dependencies +pip install -r requirements.txt +``` + +### Launch RAG Pilot in Online Mode + +To launch RAG Pilot, create the following *required files* before running the command: + +#### QA List File (`your_qa_list.json`) +Contains queries and optional ground truth answers. Below is a sample format: + +```json +[ + { + "query": "鸟类的祖先是恐龙吗?哪篇课文里讲了相关的内容?", + "ground_truth": "是的,鸟类的祖先是恐龙,这一内容在《飞向蓝天的恐龙》一文中有所讨论" + }, + { + "query": "桃花水是什么季节的水?" + } +] +``` + +Run the following command to start the tuning process. The output RAG results will be stored in `rag_pipeline_out.json`: + +```bash +# Run pipeline tuning tool +export ECRAG_SERVICE_HOST_IP="ecrag_host_ip" +python3 -m pipeline_tune -q "your_qa_list.json" -o "rag_pipeline_out.json" +``` + +## Offline RAG Tuning + +RAG Pilot supports offline mode using a RAG configuration file. + +### Environment Setup + +Refer to [Create Running Environment](#create-running-environment) in the Online RAG pipeline tuning section for setting up the environment before proceeding. + +### Launch RAG Pilot in Offline Mode + +To launch RAG Pilot, create the following *required files* before running the command: + +#### RAG Configuration File (`your_rag_pipeline.json`) +Settings for the RAG pipeline. Please follow the format of file `configs/pipeline_sample.json`, which is compatible with [EdgeCraftRAG](https://github.com/opea-project/GenAIExamples/tree/main/EdgeCraftRAG) + +#### RAG Results File (`your_rag_results.json`) +Contains queries, responses, lists of contexts, and optional ground truth. Below is a sample format: + +```json +[ + { + "query": "鸟类的祖先是恐龙吗?哪篇课文里讲了相关的内容?", + "contexts": ["恐龙演化成鸟类的证据..."], + "response": "是的,鸟类的祖先是恐龙。", + "ground_truth": "是的,鸟类的祖先是恐龙,这一内容在《飞向蓝天的恐龙》一文中有所讨论" + } +] +``` + +Run the following command to start offline tuning. The output RAG results will be stored in `rag_pipeline_out.json`: + +```bash +python3 -m pipeline_tune --offline -c "your_rag_pipeline.json" -r "your_rag_results.json" -o "rag_pipeline_out.json" +``` + +## How to use RAG Pilot to tune your RAG solution + +### What's Nodes and Modules + +RAG Pilot represents each stage of the RAG pipeline as a **node**, such as `node_parser`, `indexer`, `retriever`, etc. Each node can have different **modules** that define its type and configuration. The nodes and modules are specified in a YAML file, allowing users to switch between different implementations easily. + +Here is an example of nodes and modules for EdgeCraftRAG. + +![RAG Pilot Architecture](RAG_Pilot.png) + +### How to configure Nodes and Modules + +The available nodes and their modules are stored in a YAML file (i.e. `configs/ecrag.yaml` for EdgeCraftRAG as below). Each node can have multiple modules, and both nodes and modules have configurable parameters that can be tuned. + +```yaml +nodes: + - node: node_parser + modules: + - module_type: simple + chunk_size: 400 + chunk_overlap: 48 + - module_type: hierarchical + chunk_sizes: [256, 384, 512] + - node: indexer + embedding_model: [BAAI/bge-small-zh-v1.5, BAAI/bge-small-en-v1.5] + modules: + - module_type: vector + - module_type: faiss_vector + - node: retriever + retrieve_topk: 30 + modules: + - module_type: vectorsimilarity + - module_type: auto_merge + - module_type: bm25 + - node: postprocessor + modules: + - module_type: reranker + top_n: 3 + reranker_model: BAAI/bge-reranker-large + - module_type: metadata_replace + - node: generator + model: [Qwen/Qwen2-7B-Instruct] + inference_type: [local, vllm] + prompt: null +``` + +1. **Each Node Can Have Multiple Modules** + - A node represents a stage in the RAG pipeline, such as `node_parser`, `indexer`, or `retriever`. + - Each node can support different modules that define how it operates. For example, the `node_parser` node can use either a `simple` or `hierarchical` module. + +2. **Nodes Have Parameters to Tune** + - Some nodes have global parameters that affect all modules within them. For instance, the `retriever` node has a `retrieve_topk` parameter that defines how many top results are retrieved. + +3. **Modules Have Parameters to Tune** + - Each module within a node can have its own parameters. For example, the `simple` parser module has `chunk_size` and `chunk_overlap` parameters, while the `hierarchical` parser module supports multiple `chunk_sizes`. + +4. **Each Node Selects Its Module Based on a Type Map** + - The tool uses an internal mapping to associate each module type with its corresponding function. The type of module selected for each node is defined in a mapping system like the one below: + + ```python + COMP_TYPE_MAP = { + "node_parser": "parser_type", + "indexer": "indexer_type", + "retriever": "retriever_type", + "postprocessor": "processor_type", + "generator": "inference_type", + } + ``` + +### How to use Nodes and Modules + +Besides the YAML configuration file, the tool also uses a module map to associate each module with a runnable instance. This ensures that the tool correctly links each module type to its respective function within the pipeline. + +#### Example: Mapping Modules to Functions +The function below defines how different module types are mapped to their respective components in EdgeCraftRAG: + +```python +def get_ecrag_module_map(ecrag_pl): + ecrag_modules = { + # root + "root": (ecrag_pl, ""), + # node_parser + "node_parser": (ecrag_pl, "node_parser"), + "simple": (ecrag_pl, "node_parser"), + "hierarchical": (ecrag_pl, "node_parser"), + "sentencewindow": (ecrag_pl, "node_parser"), + # indexer + "indexer": (ecrag_pl, "indexer"), + "vector": (ecrag_pl, "indexer"), + "faiss_vector": (ecrag_pl, "indexer"), + # retriever + "retriever": (ecrag_pl, "retriever"), + "vectorsimilarity": (ecrag_pl, "retriever"), + "auto_merge": (ecrag_pl, "retriever"), + "bm25": (ecrag_pl, "retriever"), + # postprocessor + "postprocessor": (ecrag_pl, "postprocessor[0]"), + "reranker": (ecrag_pl, "postprocessor[0]"), + "metadata_replace": (ecrag_pl, "postprocessor[0]"), + # generator + "generator": (ecrag_pl, "generator"), + } + return ecrag_modules +``` + + +By modifying the YAML configuration file and understanding how modules are mapped to functions, you can experiment with different configurations and parameter settings to optimize their RAG pipeline effectively. diff --git a/evals/evaluation/rag_pilot/components/__init__.py b/evals/evaluation/rag_pilot/components/__init__.py new file mode 100644 index 00000000..4057dc01 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/evals/evaluation/rag_pilot/components/adaptor/__init__.py b/evals/evaluation/rag_pilot/components/adaptor/__init__.py new file mode 100644 index 00000000..4057dc01 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/adaptor/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/evals/evaluation/rag_pilot/components/adaptor/adaptor.py b/evals/evaluation/rag_pilot/components/adaptor/adaptor.py new file mode 100644 index 00000000..31f486a3 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/adaptor/adaptor.py @@ -0,0 +1,75 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from typing import Callable, Optional + +from components.adaptor.base import Module, Node, convert_tuple, get_support_modules + + +class Adaptor: + + def __init__(self, yaml_data: str): + self.nodes = self.parse_nodes(yaml_data) + self.root_func: Optional[Callable] = None + + def parse_nodes(self, yaml_data): + parsed_nodes = {} + for node in yaml_data.get("nodes", []): + node_type = node.get("node") + modules_dict = { + mod.get("module_type"): Module( + type=mod.get("module_type", ""), + params={k: convert_tuple(v) for k, v in mod.items() if k not in ["module_type"]}, + ) + for mod in node.get("modules", []) + if mod.get("module_type") + } + node_params = {k: convert_tuple(v) for k, v in node.items() if k not in ["node", "node_type", "modules"]} + cur_node = Node(type=node_type, params=node_params, modules=modules_dict) + if node_type in parsed_nodes: + parsed_nodes[node_type].append(cur_node) + else: + parsed_nodes[node_type] = [cur_node] + return parsed_nodes + + def get_node(self, node_type, idx=0): + nodes = self.nodes[node_type] if node_type in self.nodes else None + return nodes[idx] if nodes and idx < len(nodes) else None + + def get_modules_from_node(self, node_type, idx=0): + node = self.get_node(node_type, idx) + return node.modules if node else None + + def get_module(self, node_type, module_type, idx=0): + if module_type is None: + return self.get_node(node_type, idx) + else: + modules = self.get_modules_from_node(node_type, idx) + return modules[module_type] if modules and module_type in modules else None + + def update_all_module_functions(self, module_map, node_type_map): + self.root_func = get_support_modules("root", module_map) + + for node_list in self.nodes.values(): + for node in node_list: + node.update_func(module_map) + node.is_active = False + for module in node.modules.values(): + module.update_func(module_map) + module.is_active = False + + self.activate_modules_based_on_type(node_type_map) + + def activate_modules_based_on_type(self, node_type_map): + if not self.root_func: + return + + for node_list in self.nodes.values(): + for node in node_list: + node_type = node.type + if not getattr(self.root_func, node_type, None): + continue + node.is_active = True + active_module_type = getattr(node.func, node_type_map[node_type], None) + if active_module_type and active_module_type in node.modules: + node.modules[active_module_type].is_active = True diff --git a/evals/evaluation/rag_pilot/components/adaptor/base.py b/evals/evaluation/rag_pilot/components/adaptor/base.py new file mode 100644 index 00000000..b13e2df0 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/adaptor/base.py @@ -0,0 +1,142 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import ast +from copy import deepcopy +from dataclasses import dataclass, field +from typing import Any, Callable, Dict, Optional, Tuple + + +def get_support_modules(module_name: str, module_map) -> Callable: + support_modules = module_map + return dynamically_find_function(module_name, support_modules) + + +def dynamically_find_function(key: str, target_dict: Dict) -> Callable: + if key in target_dict: + instance, attr_expression = target_dict[key] + if "[" in attr_expression and "]" in attr_expression: + attr_name, index = attr_expression[:-1].split("[") + index = int(index) + func = getattr(instance, attr_name) + if isinstance(func, list) and 0 <= index < len(func): + func = func[index] + else: + raise ValueError(f"Attribute '{attr_name}' is not a list or index {index} is out of bounds") + elif attr_expression == "": + func = instance + else: + func = getattr(instance, attr_expression) + return func + else: + print(f"Input module or node '{key}' is not supported.") + + +def convert_tuple(value): + if isinstance(value, str): + try: + evaluated = ast.literal_eval(value) + if isinstance(evaluated, tuple): + if len(evaluated) == 2: + return Range(*evaluated) + else: + return evaluated + except (SyntaxError, ValueError): + pass + return value + + +class Range: + def __init__(self, min_value: int, max_value: int): + self.min = min_value + self.max = max_value + + +class ModuleBase: + def __init__(self, type: str, params: Dict[str, Any]): + self.type: str = type + self.params: Dict[str, Any] = params + self.func: Optional[Callable] = None + self.is_active = False + + @classmethod + def from_dict(cls, component_dict: Dict) -> "ModuleBase": + _component_dict = deepcopy(component_dict) + type = _component_dict.pop("type") + params = _component_dict + return cls(type, params) + + def update_func(self, module_map): + self.func = get_support_modules(self.type, module_map) + if self.func is None: + print(f"{self.__class__.__name__} type {self.type} is not supported.") + + def get_params(self, attr): + return self.params[attr] if attr in self.params else None + + def get_status(self): + return self.is_active + + def get_value(self, attr): + if self.func is None: + print(f"{self.__class__.__name__} type {self.type} is not supported.") + else: + return getattr(self.func, attr, None) + + def set_value(self, attr, value): + if self.func is None: + print(f"{self.__class__.__name__} type {self.type} is not supported.") + else: + setattr(self.func, attr, value) + + +@dataclass +class Module(ModuleBase): + type: str + params: Dict[str, Any] + func: Optional[Callable] + + def __init__(self, type, params): + super().__init__(type, params) + + +@dataclass +class Node(ModuleBase): + type: str + params: Dict[str, Any] + modules: Dict[str, Module] + func: Optional[Callable] + + def __init__(self, type, params, modules): + super().__init__(type, params) + self.modules = modules + + @classmethod + def from_dict(cls, node_dict: Dict) -> "Node": + _node_dict = deepcopy(node_dict) + type = _node_dict.pop("type") + modules_dict = _node_dict.pop("modules") + modules = {key: Module.from_dict(value) for key, value in modules_dict.items()} + params = _node_dict + return cls(type, params, modules) + + def get_params(self, attr): + if attr in self.params: + return self.params[attr] + # Make sure attr ends with "type" when tuning node's modules + elif attr.endswith("type"): + return list(self.modules.keys()) + else: + return None + + def set_value(self, attr, value): + if self.func is None: + print(f"{self.__class__.__name__} type {self.type} is not supported.") + else: + setattr(self.func, attr, value) + if value in self.modules: + module = self.modules[value] + for param in module.params: + val = module.get_params(param) + if val: + module.set_value(param, val) diff --git a/evals/evaluation/rag_pilot/components/constructor/connector.py b/evals/evaluation/rag_pilot/components/constructor/connector.py new file mode 100644 index 00000000..abcd64ce --- /dev/null +++ b/evals/evaluation/rag_pilot/components/constructor/connector.py @@ -0,0 +1,54 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os + +import requests +from components.constructor.constructor import convert_dict_to_pipeline +from components.constructor.ecrag.api_schema import PipelineCreateIn, RagOut + +ECRAG_SERVICE_HOST_IP = os.getenv("ECRAG_SERVICE_HOST_IP", "127.0.0.1") +ECRAG_SERVICE_PORT = int(os.getenv("ECRAG_SERVICE_PORT", 16010)) +server_addr = f"http://{ECRAG_SERVICE_HOST_IP}:{ECRAG_SERVICE_PORT}" + + +def get_active_pipeline() -> PipelineCreateIn: + path = "/v1/settings/pipelines" + res = requests.get(f"{server_addr}{path}", proxies={"http": None}) + if res.status_code == 200: + for pl in res.json(): + if pl["status"]["active"]: + return convert_dict_to_pipeline(pl) + return None + + +def update_pipeline(pipeline_conf): + path = "/v1/settings/pipelines" + return requests.patch( + f"{server_addr}{path}/{pipeline_conf.name}", json=pipeline_conf.dict(), proxies={"http": None} + ) + + +def get_ragqna(query): + new_req = {"messages": query} + path = "/v1/ragqna" + res = requests.post(f"{server_addr}{path}", json=new_req, proxies={"http": None}) + if res.status_code == 200: + return RagOut(**res.json()) + else: + return None + + +def reindex_data(): + path = "/v1/data" + res = requests.post(f"{server_addr}{path}/reindex", proxies={"http": None}) + return res.status_code == 200 + + +def update_active_pipeline(pipeline): + pipeline.active = False + res = update_pipeline(pipeline) + if res.status_code == 200: + pipeline.active = True + res = update_pipeline(pipeline) + return res.status_code == 200 diff --git a/evals/evaluation/rag_pilot/components/constructor/constructor.py b/evals/evaluation/rag_pilot/components/constructor/constructor.py new file mode 100644 index 00000000..5069cec7 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/constructor/constructor.py @@ -0,0 +1,192 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import copy +import json +from typing import Dict, Optional + +from components.constructor.ecrag.api_schema import ( + GeneratorIn, + IndexerIn, + ModelIn, + NodeParserIn, + PipelineCreateIn, + PostProcessorIn, + RetrieverIn, +) + + +def convert_dict_to_pipeline(pl: dict) -> PipelineCreateIn: + def initialize_component(cls, data, extra=None, key_map=None, nested_fields=None): + if not data: + return None + + extra = extra or {} + key_map = key_map or {} + nested_fields = nested_fields or {} + + processed_data = {} + for k, v in data.items(): + mapped_key = key_map.get(k, k) + if mapped_key in nested_fields: + processed_data[mapped_key] = initialize_component(nested_fields[mapped_key], v) + else: + processed_data[mapped_key] = v + if cls == ModelIn: + processed_data["model_type"] = data.get("type", processed_data.get("model_type")) + + processed_data.update(extra) + return cls(**processed_data) + + return PipelineCreateIn( + idx=pl.get("idx"), + name=pl.get("name"), + node_parser=initialize_component(NodeParserIn, pl.get("node_parser"), key_map={"idx": "idx"}), + indexer=initialize_component( + IndexerIn, + pl.get("indexer"), + key_map={"model": "embedding_model", "idx": "idx"}, + nested_fields={"embedding_model": ModelIn}, + ), + retriever=initialize_component(RetrieverIn, pl.get("retriever"), key_map={"idx": "idx"}), + postprocessor=[ + initialize_component( + PostProcessorIn, + pp, + extra={"processor_type": pp.get("processor_type")}, + key_map={"model": "reranker_model", "idx": "idx"}, + nested_fields={"reranker_model": ModelIn}, + ) + for pp in pl.get("postprocessor", []) + ], + generator=initialize_component( + GeneratorIn, pl.get("generator"), key_map={"idx": "idx"}, nested_fields={"model": ModelIn} + ), + active=pl.get("status", {}).get("active", False), + ) + + +class Constructor: + + def __init__(self): + self.pl: Optional[PipelineCreateIn] = None + self._backup = None + + def _replace_model_with_id(self): + self._backup = {} + + def extract_model_id(model): + if model: + if model.model_type not in self._backup: + self._backup[model.model_type] = [] + self._backup[model.model_type].append(model) + return model.model_id + return None + + if self.pl.indexer and self.pl.indexer.embedding_model: + self.pl.indexer.embedding_model = extract_model_id(self.pl.indexer.embedding_model) + + if self.pl.postprocessor: + for proc in self.pl.postprocessor: + if proc.reranker_model: + proc.reranker_model = extract_model_id(proc.reranker_model) + + if self.pl.generator and self.pl.generator.model: + self.pl.generator.model = extract_model_id(self.pl.generator.model) + + def _restore_model_instances(self): + if not self._backup: + self._backup = {} + + def restore_model(model_id, model_type, is_generator=False): + if model_type in self._backup: + for existing_model in self._backup[model_type]: + if existing_model.model_id == model_id: + return existing_model + + weight = self._backup[model_type][0].weight + device = self._backup[model_type][0].device + else: + weight = "INT4" if is_generator else "" + device = "auto" + + model_path = f"./models/{model_id}" + if is_generator: + model_path += f"/{weight}_compressed_weights" + + return ModelIn( + model_type=model_type, model_id=model_id, model_path=model_path, weight=weight, device=device + ) + + if self.pl.indexer and isinstance(self.pl.indexer.embedding_model, str): + self.pl.indexer.embedding_model = restore_model(self.pl.indexer.embedding_model, "embedding") + + if self.pl.postprocessor: + for proc in self.pl.postprocessor: + if isinstance(proc.reranker_model, str): + proc.reranker_model = restore_model(proc.reranker_model, "reranker") + + if self.pl.generator and isinstance(self.pl.generator.model, str): + self.pl.generator.model = restore_model(self.pl.generator.model, "llm", is_generator=True) + + self._backup = None + + def set_pipeline(self, pl): + self.pl = pl + self._replace_model_with_id() + + def export_pipeline(self): + self._restore_model_instances() + exported_pl = copy.deepcopy(self.pl) + self._replace_model_with_id() + return exported_pl + + def activate_pl(self): + self.pl.active = True + + def deactivate_pl(self): + self.pl.active = False + + def save_pipeline_to_json(self, save_path="pipeline.json"): + if self.pl: + pipeline_dict = self.export_pipeline().dict() + with open(save_path, "w") as json_file: + json.dump(pipeline_dict, json_file, indent=4) + print(f'RAG pipeline is successfully exported to "{save_path}"') + + +def get_ecrag_module_map(ecrag_pl): + ecrag_modules = { + # root + "root": (ecrag_pl, ""), + # node_parser + "node_parser": (ecrag_pl, "node_parser"), + "simple": (ecrag_pl, "node_parser"), + "hierarchical": (ecrag_pl, "node_parser"), + "sentencewindow": (ecrag_pl, "node_parser"), + # indexer + "indexer": (ecrag_pl, "indexer"), + "vector": (ecrag_pl, "indexer"), + "faiss_vector": (ecrag_pl, "indexer"), + # retriever + "retriever": (ecrag_pl, "retriever"), + "vectorsimilarity": (ecrag_pl, "retriever"), + "auto_merge": (ecrag_pl, "retriever"), + "bm25": (ecrag_pl, "retriever"), + # postprocessor + "postprocessor": (ecrag_pl, "postprocessor[0]"), + "reranker": (ecrag_pl, "postprocessor[0]"), + "metadata_replace": (ecrag_pl, "postprocessor[0]"), + # generator + "generator": (ecrag_pl, "generator"), + } + return ecrag_modules + + +COMP_TYPE_MAP = { + "node_parser": "parser_type", + "indexer": "indexer_type", + "retriever": "retriever_type", + "postprocessor": "processor_type", + "generator": "inference_type", +} diff --git a/evals/evaluation/rag_pilot/components/constructor/ecrag/api_schema.py b/evals/evaluation/rag_pilot/components/constructor/ecrag/api_schema.py new file mode 100644 index 00000000..dbdc96e2 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/constructor/ecrag/api_schema.py @@ -0,0 +1,69 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from typing import Optional + +from pydantic import BaseModel + + +class ModelIn(BaseModel): + model_type: Optional[str] = "LLM" + model_id: Optional[str] + model_path: Optional[str] = "./" + weight: Optional[str] + device: Optional[str] = "cpu" + + +class NodeParserIn(BaseModel): + chunk_size: Optional[int] = None + chunk_overlap: Optional[int] = None + chunk_sizes: Optional[list] = None + parser_type: str + window_size: Optional[int] = None + + +class IndexerIn(BaseModel): + indexer_type: str + embedding_model: Optional[ModelIn] = None + + +class RetrieverIn(BaseModel): + retriever_type: str + retrieve_topk: Optional[int] = 3 + + +class PostProcessorIn(BaseModel): + processor_type: str + reranker_model: Optional[ModelIn] = None + top_n: Optional[int] = 5 + + +class GeneratorIn(BaseModel): + prompt_path: Optional[str] = None + model: Optional[ModelIn] = None + inference_type: Optional[str] = "local" + + +class PipelineCreateIn(BaseModel): + name: Optional[str] = None + node_parser: Optional[NodeParserIn] = None + indexer: Optional[IndexerIn] = None + retriever: Optional[RetrieverIn] = None + postprocessor: Optional[list[PostProcessorIn]] = None + generator: Optional[GeneratorIn] = None + active: Optional[bool] = False + + +class DataIn(BaseModel): + text: Optional[str] = None + local_path: Optional[str] = None + + +class FilesIn(BaseModel): + local_paths: Optional[list[str]] = None + + +class RagOut(BaseModel): + query: str + contexts: Optional[list[str]] = None + response: str diff --git a/evals/evaluation/rag_pilot/components/constructor/ecrag/base.py b/evals/evaluation/rag_pilot/components/constructor/ecrag/base.py new file mode 100644 index 00000000..a163c486 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/constructor/ecrag/base.py @@ -0,0 +1,129 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import abc +import uuid +from enum import Enum +from typing import Any, Callable, List, Optional + +from pydantic import BaseModel, ConfigDict, Field, model_serializer + + +class CompType(str, Enum): + + DEFAULT = "default" + MODEL = "model" + PIPELINE = "pipeline" + NODEPARSER = "node_parser" + INDEXER = "indexer" + RETRIEVER = "retriever" + POSTPROCESSOR = "postprocessor" + GENERATOR = "generator" + FILE = "file" + + +class ModelType(str, Enum): + + EMBEDDING = "embedding" + RERANKER = "reranker" + LLM = "llm" + VLLM = "vllm" + + +class FileType(str, Enum): + TEXT = "text" + VISUAL = "visual" + AURAL = "aural" + VIRTUAL = "virtual" + OTHER = "other" + + +class NodeParserType(str, Enum): + + DEFAULT = "default" + SIMPLE = "simple" + HIERARCHY = "hierarchical" + SENTENCEWINDOW = "sentencewindow" + + +class IndexerType(str, Enum): + + DEFAULT = "default" + FAISS_VECTOR = "faiss_vector" + DEFAULT_VECTOR = "vector" + + +class RetrieverType(str, Enum): + + DEFAULT = "default" + VECTORSIMILARITY = "vectorsimilarity" + AUTOMERGE = "auto_merge" + BM25 = "bm25" + + +class PostProcessorType(str, Enum): + + RERANKER = "reranker" + METADATAREPLACE = "metadata_replace" + + +class GeneratorType(str, Enum): + + CHATQNA = "chatqna" + + +class InferenceType(str, Enum): + + LOCAL = "local" + VLLM = "vllm" + + +class CallbackType(str, Enum): + + DATAPREP = "dataprep" + RETRIEVE = "retrieve" + PIPELINE = "pipeline" + + +class BaseComponent(BaseModel): + + model_config = ConfigDict(extra="allow", arbitrary_types_allowed=True) + + idx: str = Field(default_factory=lambda: str(uuid.uuid4())) + name: Optional[str] = Field(default="") + comp_type: str = Field(default="") + comp_subtype: Optional[str] = Field(default="") + + @model_serializer + def ser_model(self): + set = { + "idx": self.idx, + "name": self.name, + "comp_type": self.comp_type, + "comp_subtype": self.comp_subtype, + } + return set + + @abc.abstractmethod + def run(self, **kwargs) -> Any: + pass + + +class BaseMgr: + + def __init__(self): + self.components = {} + + def add(self, comp: BaseComponent): + self.components[comp.idx] = comp + + def get(self, idx: str) -> BaseComponent: + if idx in self.components: + return self.components[idx] + else: + return None + + def remove(self, idx): + # remove the reference count + # after reference count == 0, object memory can be freed with Garbage Collector + del self.components[idx] diff --git a/evals/evaluation/rag_pilot/components/tuner/__init__.py b/evals/evaluation/rag_pilot/components/tuner/__init__.py new file mode 100644 index 00000000..4057dc01 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/tuner/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/evals/evaluation/rag_pilot/components/tuner/base.py b/evals/evaluation/rag_pilot/components/tuner/base.py new file mode 100644 index 00000000..e56ffe9f --- /dev/null +++ b/evals/evaluation/rag_pilot/components/tuner/base.py @@ -0,0 +1,72 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from enum import Enum +from typing import Callable, List, Optional, Tuple, Union + +from pydantic import BaseModel + + +class RagResult(BaseModel): + query: str + contexts: Optional[List[str]] = None + ground_truth: Optional[str] = None + response: str + + +class ContentType(str, Enum): + ALL_CONTEXTS = "all_contexts" + CONTEXT = "context" + RESPONSE = "response" + + +class QuestionType(str, Enum): + RATING5 = "1-5" + RATING3 = "1-3" + BOOL = "y/n" + SCORE = "Input a number" + MINUS_ONE_ZERO_ONE = "-1, 0, 1" + + +class Question(BaseModel): + question: str + question_type: QuestionType + content_type: ContentType + + +class Feedback(BaseModel): + type: QuestionType + feedback: bool | int + + +class SuggestionType(str, Enum): + SET = "set" + OFFSET = "offset" + CHOOSE = "choose" + + +class DirectionType(str, Enum): + INCREASE = "increase" + DECREASE = "decrease" + + +class SuggestionValue(BaseModel): + suggestion_type: SuggestionType + step: Optional[int] = None + direction: Optional[DirectionType] = None + choices: Optional[List] = None + val: Optional[Union[int, float, str]] = None + + +class Suggestion(BaseModel): + attribute: str + svalue: SuggestionValue + origval: Optional[Union[int, float, str]] = None + is_accepted: bool = False + + def reset(self): + self.is_accepted = False + self.origval = None + self.svalue.val = None + self.svalue.direction = None + self.svalue.choices = None diff --git a/evals/evaluation/rag_pilot/components/tuner/tuner.py b/evals/evaluation/rag_pilot/components/tuner/tuner.py new file mode 100644 index 00000000..f53856c9 --- /dev/null +++ b/evals/evaluation/rag_pilot/components/tuner/tuner.py @@ -0,0 +1,363 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from abc import ABC, abstractmethod + +from components.tuner.base import ( + ContentType, + DirectionType, + Feedback, + Question, + QuestionType, + RagResult, + Suggestion, + SuggestionType, + SuggestionValue, +) + + +def input_parser(qtype: QuestionType = QuestionType.BOOL): + user_input = input(f"({qtype.value}): ") + + match qtype: + case QuestionType.RATING5: + if user_input.isdigit() and 1 <= int(user_input) <= 5: + return True, int(user_input) + else: + print("Invalid input. Please enter a number between 1 and 5.") + return False, None + + case QuestionType.RATING3: + if user_input.isdigit() and 1 <= int(user_input) <= 3: + return True, int(user_input) + else: + print("Invalid input. Please enter a number between 1 and 3.") + return False, None + + case QuestionType.BOOL: + if user_input.lower() == "y": + return True, True + elif user_input.lower() == "n": + return True, False + else: + print("Invalid input. Please enter 'Y' or 'N'.") + return False, None + + case QuestionType.SCORE: + if user_input.isdigit(): + return True, int(user_input) + else: + print("Invalid input. Please enter a valid score (integer).") + return False, None + + case QuestionType.MINUS_ONE_ZERO_ONE: + if user_input in {"-1", "0", "1"}: + return True, int(user_input) + else: + print("Invalid input. Please enter -1, 0, or 1.") + return False, None + + case _: + print("Unknown question type.") + return False, None + + +def display_ragqna(ragqna, content_type=ContentType.RESPONSE, context_idx=0): + print("\nRAG Query\n" "---------\n" f"{ragqna.query}\n\n" "RAG Response\n" "------------\n" f"{ragqna.response}\n") + + if content_type == ContentType.ALL_CONTEXTS: + if ragqna.contexts: + for index, context in enumerate(ragqna.contexts): + cleaned_context = context.replace("\n", " ") + print(f"RAG Context {index}\n" "-------------------\n" f"{cleaned_context}\n") + else: + print("RAG Contexts\n" "------------\n" "None\n") + elif content_type == ContentType.CONTEXT: + print( + f"RAG Contexts {context_idx}\n" + "--------------------------\n" + f"{ragqna.contexts[context_idx] if context_idx in ragqna.contexts else 'None'}\n" + ) + + +def display_list(list): + for index, value in enumerate(list): + print(f"{index}: {value}") + + +class Tuner(ABC): + + def __init__(self, ragqna: RagResult, question: str, question_type: QuestionType, content_type: ContentType): + + self.question = Question( + question=question, + question_type=question_type, + content_type=content_type, + ) + self.user_feedback: Feedback = None + self.suggestions: dict[str, Suggestion] = {} + + self.ragqna: RagResult = ragqna + + self.node_type = None + self.module_type = None + self.module = None + self.module_active = False + + def init_ragqna(self, ragqna): + self.ragqna = ragqna + self.user_feedback = None + for suggestion in self.suggestions.values(): + suggestion.reset() + + def update_module(self, module): + self.module = module + self.module_active = module.get_status() + + def get_suggestions_origval(self): + if not self.module_active: + return + for suggestion in self.suggestions.values(): + suggestion.origval = self.module.get_value(suggestion.attribute) + + def _display_question(self, content_type, context_idx=0): + display_ragqna(self.ragqna, content_type, context_idx) + print(f"{self}: {self.question.question}\n") + + def request_feedback(self): + if not self.module_active: + return + + self._display_question(self.question.content_type) + + valid, user_input = input_parser(self.question.question_type) + if not valid: + return False + + self.user_feedback = Feedback(type=self.question.question_type, feedback=user_input) + self.get_suggestions_origval() + return True + + @abstractmethod + def _feedback_to_suggestions(self): + pass + + def make_suggestions(self): + if not self.module_active: + return + + self._feedback_to_suggestions() + + old_new_values = {} + for attr, suggestion in self.suggestions.items(): + svalue = suggestion.svalue + origval = suggestion.origval + + match svalue.suggestion_type: + case SuggestionType.SET: + choices = suggestion.svalue.choices + if not choices: + continue + + print( + f"{attr}'s current value: {origval}\n" + f"We suggest setting a new value from choices: {choices}\n" + f"Please enter a new value: " + ) + valid, user_input = input_parser(QuestionType.SCORE) + if valid: + old_new_values[attr] = [origval, user_input] + + case SuggestionType.OFFSET: + match svalue.direction: + case DirectionType.INCREASE: + old_new_values[attr] = [origval, origval + svalue.step] + case DirectionType.DECREASE: + old_new_values[attr] = [origval, origval - svalue.step] + case _: + print(f"ERROR: Unknown direction '{svalue.direction}' for OFFSET.") + + case SuggestionType.CHOOSE: + choices = suggestion.svalue.choices + if not choices: + continue + + display_list(choices) + print(f"{attr}'s current value: {origval}\n" f"Please enter the index of a new value") + valid, user_input = input_parser(QuestionType.SCORE) + if valid: + if user_input < 0 or user_input >= len(choices): + print(f"ERROR: The chosen index {user_input} is out of range") + elif origval == choices[user_input]: + print(f"ERROR: You chose the current value {user_input}: {choices[user_input]}") + else: + old_new_values[attr] = [origval, choices[user_input]] + + case _: + print(f"ERROR: Unknown suggestion type '{svalue.suggestion_type}'.") + + if not old_new_values: + return False + + is_changed = False + for k, v in old_new_values.items(): + if self._confirm_suggestion(self.suggestions[k], v, skip_confirming=True): + is_changed = True + + if is_changed: + self._apply_suggestions() + + return is_changed + + def _confirm_suggestion(self, suggestion, old_new_value, skip_confirming=False): + svalue = suggestion.svalue + origval, newval = old_new_value + + print( + f"Based on your feedback, {svalue.direction.value if svalue.direction else 'modify'} {suggestion.attribute} {origval} -> {newval}" + ) + if skip_confirming: + valid, user_input = True, True + else: + valid, user_input = input_parser(QuestionType.BOOL) + + if not valid: + return False + + suggestion.is_accepted = user_input + if suggestion.is_accepted: + suggestion.svalue.val = newval + return True + + return False + + def _apply_suggestions(self): + for suggestion in self.suggestions.values(): + if suggestion.is_accepted: + self.module.set_value(suggestion.attribute, suggestion.svalue.val) + + def __str__(self): + return f"{self.__class__.__name__}" + + +class SimpleNodeParserChunkTuner(Tuner): + + def __init__(self, ragqna=None): + q = "Is each context split properly? \n -1) Incomplete \n 0) Good enough or skip current tuner \n 1) Includes too many contents" + super().__init__(ragqna, q, QuestionType.MINUS_ONE_ZERO_ONE, ContentType.ALL_CONTEXTS) + + self.node_type = "node_parser" + self.module_type = "simple" + + attribute = "chunk_size" + suggestion = Suggestion( + svalue=SuggestionValue(suggestion_type=SuggestionType.OFFSET, step=100), + attribute=attribute, + ) + self.suggestions[attribute] = suggestion + + attribute = "chunk_overlap" + suggestion = Suggestion( + svalue=SuggestionValue(suggestion_type=SuggestionType.OFFSET, step=8), + attribute=attribute, + ) + self.suggestions[attribute] = suggestion + + def _feedback_to_suggestions(self): + assert isinstance(self.user_feedback, Feedback) + if self.user_feedback.type == QuestionType.MINUS_ONE_ZERO_ONE: + if self.user_feedback.feedback == 1: + for suggestion in self.suggestions.values(): + suggestion.svalue.direction = DirectionType.DECREASE + elif self.user_feedback.feedback == -1: + for suggestion in self.suggestions.values(): + suggestion.svalue.direction = DirectionType.INCREASE + + +class RerankerTopnTuner(Tuner): + + def __init__(self, ragqna=None): + q = ( + "Are the contexts not enough for answering the question or some of them contain irrelevant information?\n" + + " -1) Not enough \n 0) Fine or skip current tuner \n 1) Too many contexts" + ) + super().__init__(ragqna, q, QuestionType.MINUS_ONE_ZERO_ONE, ContentType.ALL_CONTEXTS) + + self.node_type = "postprocessor" + self.module_type = "reranker" + + attribute = "top_n" + suggestion = Suggestion( + svalue=SuggestionValue(suggestion_type=SuggestionType.OFFSET, step=1), + attribute=attribute, + ) + self.suggestions[attribute] = suggestion + + def _feedback_to_suggestions(self): + assert isinstance(self.user_feedback, Feedback) + if self.user_feedback.type == QuestionType.MINUS_ONE_ZERO_ONE: + if self.user_feedback.feedback == 1: + for suggestion in self.suggestions.values(): + suggestion.svalue.direction = DirectionType.DECREASE + elif self.user_feedback.feedback == -1: + for suggestion in self.suggestions.values(): + suggestion.svalue.direction = DirectionType.INCREASE + + +class EmbeddingLanguageTuner(Tuner): + + def __init__(self, ragqna=None): + q = ( + "Does any context contain relevant information for the given RAG query?\n" + + " y) yes or skip current tuner \n n) no" + ) + super().__init__(ragqna, q, QuestionType.BOOL, ContentType.ALL_CONTEXTS) + + self.node_type = "indexer" + self.module_type = None + + attribute = "embedding_model" + suggestion = Suggestion( + svalue=SuggestionValue( + suggestion_type=SuggestionType.CHOOSE, + ), + attribute=attribute, + ) + self.suggestions[attribute] = suggestion + + def _feedback_to_suggestions(self): + assert isinstance(self.user_feedback, Feedback) + if self.user_feedback.type == QuestionType.BOOL: + if not self.user_feedback.feedback: + for suggestion in self.suggestions.values(): + suggestion.svalue.choices = self.module.get_params(suggestion.attribute) + + +class NodeParserTypeTuner(Tuner): + + def __init__(self, ragqna=None): + q = ( + "Are all contexts split with similar amount of information? \n" + + "y) All contexts contain similar amount of information or skip current tuner \n" + + "n) Some contexts contain more information while some contain less" + ) + super().__init__(ragqna, q, QuestionType.BOOL, ContentType.ALL_CONTEXTS) + + self.node_type = "node_parser" + self.module_type = None + + attribute = "parser_type" + suggestion = Suggestion( + svalue=SuggestionValue( + suggestion_type=SuggestionType.CHOOSE, + ), + attribute=attribute, + ) + self.suggestions[attribute] = suggestion + + def _feedback_to_suggestions(self): + assert isinstance(self.user_feedback, Feedback) + if self.user_feedback.type == QuestionType.BOOL: + if not self.user_feedback.feedback: + for suggestion in self.suggestions.values(): + suggestion.svalue.choices = self.module.get_params(suggestion.attribute) diff --git a/evals/evaluation/rag_pilot/configs/ecrag.yaml b/evals/evaluation/rag_pilot/configs/ecrag.yaml new file mode 100644 index 00000000..33d3c2bd --- /dev/null +++ b/evals/evaluation/rag_pilot/configs/ecrag.yaml @@ -0,0 +1,32 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +nodes: + - node: node_parser + modules: + - module_type: simple + chunk_size: 400 + chunk_overlap: 48 + - module_type: hierarchical + chunk_sizes: [256, 384, 512] + - node: indexer + embedding_model: [BAAI/bge-small-zh-v1.5, BAAI/bge-small-en-v1.5] + modules: + - module_type: vector + - module_type: faiss_vector + - node: retriever + retrieve_topk: 30 + modules: + - module_type: vectorsimilarity + - module_type: auto_merge + - module_type: bm25 + - node: postprocessor + modules: + - module_type: reranker + top_n: 3 + reranker_model: BAAI/bge-reranker-large + - module_type: metadata_replace + - node: generator + model: [Qwen/Qwen2-7B-Instruct] + inference_type: [local, vllm] + prompt: null diff --git a/evals/evaluation/rag_pilot/configs/qa_list_sample.json b/evals/evaluation/rag_pilot/configs/qa_list_sample.json new file mode 100644 index 00000000..52cd0ae9 --- /dev/null +++ b/evals/evaluation/rag_pilot/configs/qa_list_sample.json @@ -0,0 +1,9 @@ +[ + { + "query": "鸟类的祖先是恐龙吗?哪篇课文里讲了相关的内容?", + "ground_truth": "是的,鸟类的祖先是恐龙,这一内容在《飞向蓝天的恐龙》一文中有所讨论" + }, + { + "query": "桃花水是什么季节的水?" + } +] diff --git a/evals/evaluation/rag_pilot/configs/rag_pipeline_sample.json b/evals/evaluation/rag_pilot/configs/rag_pipeline_sample.json new file mode 100644 index 00000000..26a8eeec --- /dev/null +++ b/evals/evaluation/rag_pilot/configs/rag_pipeline_sample.json @@ -0,0 +1,49 @@ +{ + "name": "rag_test_local_llm", + "node_parser": { + "chunk_size": 512, + "chunk_overlap": 64, + "chunk_sizes": null, + "parser_type": "simple", + "window_size": null + }, + "indexer": { + "indexer_type": "faiss_vector", + "embedding_model": { + "model_type": "embedding", + "model_id": "BAAI/bge-small-zh-v1.5", + "model_path": "./models/BAAI/bge-small-zh-v1.5", + "weight": "", + "device": "auto" + } + }, + "retriever": { + "retriever_type": "vectorsimilarity", + "retrieve_topk": 30 + }, + "postprocessor": [ + { + "processor_type": "reranker", + "reranker_model": { + "model_type": "reranker", + "model_id": "BAAI/bge-reranker-large", + "model_path": "./models/BAAI/bge-reranker-large", + "weight": "", + "device": "auto" + }, + "top_n": 2 + } + ], + "generator": { + "prompt_path": null, + "model": { + "model_type": "llm", + "model_id": "Qwen/Qwen2-7B-Instruct", + "model_path": "./models/Qwen/Qwen2-7B-Instruct/INT4_compressed_weights", + "weight": "INT4", + "device": "auto" + }, + "inference_type": "local" + }, + "active": true +} diff --git a/evals/evaluation/rag_pilot/configs/rag_results_sample.json b/evals/evaluation/rag_pilot/configs/rag_results_sample.json new file mode 100644 index 00000000..f72f65d1 --- /dev/null +++ b/evals/evaluation/rag_pilot/configs/rag_results_sample.json @@ -0,0 +1,15 @@ +[ + { + "query": "鸟类的祖先是恐龙吗?哪篇课文里讲了相关的内容?", + "contexts": [ + "默读课文,把不懂的问题写下来,并试着解决。\n 假如你是一个解说员,会怎样简明扼要地介绍恐龙飞向蓝天 ,\n演化成鸟类的过程?\n 课文中的不少语句表达很准确,如 “科学家们希望能够全面揭示\n这一历史进程” 。找出这样的语句读一读,说说自己的体会。 \n小练笔\n        读一读,注意加点的部分,再照样子写一段话。\n数千万年后,它的后代繁衍成一个形态各异的庞大家族。\n有些恐龙像它们的祖先一样用两足奔跑\n3333\n,有些恐龙则用四足行\n333\n走\n3\n。有些恐龙身\n3\n长\n3\n几十米\n333\n,重达数十吨\n33333\n;有些恐龙则身材小巧\n3333\n,\n体重\n33\n只有\n33\n几\n3\n千克\n33\n。有些恐龙凶猛异常\n3333\n,是茹\nrP\n毛饮血的食肉\n33\n动物;\n有些恐龙则温顺可爱\n3333\n,以植物为食\n33333\n。\n科学界存在着多种解释鸟类起源的假说。", + "说到恐龙,人们往往想到凶猛的霸王龙或者笨重、迟\n钝\ndMn\n的马门溪龙;谈起鸟类,我们头脑中自然会浮现轻灵的\n鸽子或者五彩斑斓\nlWn\n的孔雀。二者似乎毫不相干,但近年来\n发现的大量化石显示:在中生代时期,恐龙的一支经过漫\nmSn\n长的演化,最终变成了凌空翱\nWo\n翔的鸟儿。\n早在19世纪,英国学者赫\nhF\n胥\nxO\n黎就注意到恐龙和鸟类在\n骨骼\ngR\n结构上有许多相似之处。在研究了大量恐龙和鸟类化\n石之后,科学家们提出,鸟类不仅\njJn\n和恐龙有亲缘关系,而\n且很可能就是一种小型恐龙的后裔\nyK\n。根据这一假说,一些\n与鸟类亲缘关系较近的恐龙应该长 有羽毛,但一直没有找\n到化石证据。20世纪末期,我国科学家在辽宁西部首次发\n本文作者徐星,选作课文时有改动。\n飞向蓝天的恐龙6\n20\n~ß\u0016rH" + ], + "response": "是的,鸟类的祖先是恐龙,这一内容在《飞向蓝天的恐龙》一文中有所讨论", + "ground_truth": "是的,鸟类的祖先是恐龙,这一内容在《飞向蓝天的恐龙》一文中有所讨论" + }, + { + "query": "桃花水是什么季节的水?", + "response": "桃花水是春季的水" + } +] diff --git a/evals/evaluation/rag_pilot/pipeline_tune.py b/evals/evaluation/rag_pilot/pipeline_tune.py new file mode 100644 index 00000000..b79684df --- /dev/null +++ b/evals/evaluation/rag_pilot/pipeline_tune.py @@ -0,0 +1,252 @@ +# Copyright (C) 2025 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import argparse +import json +from enum import Enum + +import yaml +from components.adaptor.adaptor import Adaptor +from components.constructor.connector import get_active_pipeline, get_ragqna, reindex_data, update_active_pipeline +from components.constructor.constructor import ( + COMP_TYPE_MAP, + Constructor, + convert_dict_to_pipeline, + get_ecrag_module_map, +) +from components.tuner.base import ContentType, QuestionType, RagResult +from components.tuner.tuner import ( + EmbeddingLanguageTuner, + NodeParserTypeTuner, + RerankerTopnTuner, + SimpleNodeParserChunkTuner, + display_ragqna, + input_parser, +) + + +def load_qa_list_from_json(file_path): + try: + with open(file_path, "r", encoding="utf-8") as file: + data = json.load(file) + + qa_list = [] + for item in data: + query = item.get("query", "").strip() + ground_truth = item.get("ground_truth", "").strip() + + if query: + qa_list.append({"query": query, "ground_truth": ground_truth}) + + return qa_list + except FileNotFoundError: + print(f"The file '{file_path}' was not found.") + except json.JSONDecodeError: + print(f"Error decoding JSON in the file '{file_path}'.") + except Exception as e: + print(f"An unexpected error occurred: {e}") + + return [] + + +def load_ragresults_from_json(file_path): + try: + with open(file_path, "r", encoding="utf-8") as file: + data = json.load(file) + + qa_list = [] + ragresults = [] + for item in data: + query = item.get("query", "").strip() + ground_truth = item.get("ground_truth", "").strip() + response = item.get("response", "").strip() + contexts = item.get("contexts", []) + + if query and response: + qa_list.append({"query": query, "ground_truth": ground_truth}) + ragresults.append( + RagResult(query=query, ground_truth=ground_truth, response=response, contexts=contexts) + ) + + return qa_list, ragresults + except FileNotFoundError: + print(f"The file '{file_path}' was not found.") + except json.JSONDecodeError: + print(f"Error decoding JSON in the file '{file_path}'.") + except Exception as e: + print(f"An unexpected error occurred: {e}") + + return [], [] + + +def load_pipeline_from_json(file_path): + try: + with open(file_path, "r", encoding="utf-8") as file: + data = json.load(file) + return convert_dict_to_pipeline(data) + except FileNotFoundError: + print(f"The file '{file_path}' was not found.") + except json.JSONDecodeError: + print(f"Error decoding JSON in the file '{file_path}'.") + except Exception as e: + print(f"An unexpected error occurred: {e}") + + return None + + +def read_yaml(file_path): + with open(file_path, "r") as file: + yaml_content = file.read() + return yaml.safe_load(yaml_content) + + +class Mode(str, Enum): + ONLINE = "online" + OFFLINE = "offline" + + +def main(): + parser = argparse.ArgumentParser() + + # common + parser.add_argument( + "-y", + "--rag_module_yaml", + default="configs/ecrag.yaml", + type=str, # TODO rename + help="Path to the YAML file containing all tunable rag configurations.", + ) + parser.add_argument( + "-o", + "--output_config_json", + default="rag_pipeline_out.json", + type=str, + help="Path to the JSON file containing the list of queries.", + ) + + # online + parser.add_argument( + "-q", + "--qa_list_json", + default="configs/qa_list_sample.json", + type=str, + help="Path to the JSON file containing the list of queries.", + ) + + # offline + parser.add_argument("--offline", action="store_true", help="Run offline mode. Default is online mode.") + parser.add_argument( + "-c", + "--rag_pipeline_json", + default="configs/rag_pipeline_sample.json", + type=str, + help="Path to the JSON file containing the list of queries.", + ) + parser.add_argument( + "-r", + "--ragresults_json", + default="configs/rag_results_sample.json", + type=str, + help="Path to the JSON file of the ragresult.", + ) + + args = parser.parse_args() + + mode = Mode.OFFLINE if args.offline else Mode.ONLINE + print(f"Running in {mode.value} mode.") + + constructor = Constructor() + adaptor = Adaptor(read_yaml(args.rag_module_yaml)) + tuners = [ + EmbeddingLanguageTuner(), + NodeParserTypeTuner(), + SimpleNodeParserChunkTuner(), + RerankerTopnTuner(), + ] + + if mode == Mode.ONLINE: + qa_list = load_qa_list_from_json(args.qa_list_json) + else: + qa_list, ragresults = load_ragresults_from_json(args.ragresults_json) + + for qa in qa_list: + query = qa["query"] + ground_truth = qa["ground_truth"] + if mode == Mode.ONLINE: + reindex_data() + while True: + if mode == Mode.ONLINE: + active_pl = get_active_pipeline() + ragqna = get_ragqna(query) + else: + active_pl = load_pipeline_from_json(args.rag_pipeline_json) + ragqna = next((r for r in ragresults if query == query), None) + + if not ragqna or not active_pl: + if not ragqna: + print("Error: RAG result is invalid") + if not active_pl: + print("Error: RAG pipeline is invalid") + break + + constructor.set_pipeline(active_pl) + + print("############################################") + print(f"Tuning RAG with query: “{query}”") + display_ragqna(ragqna) + print(f'Is the response correct or do you want to stop tuning query "{query}"?') + if ground_truth: + print(f'\033[90mGround truth for this query is: "{ground_truth}"\033[0m') + print("y: Completely correct or stop tuning") + print("n: Not correct and keep tuning") + valid, correctness = input_parser(QuestionType.BOOL) + + if valid and correctness: + print("Do you want to check the tuned contexts?") + valid, confirm = input_parser(QuestionType.BOOL) + if valid and confirm: + display_ragqna(ragqna, ContentType.ALL_CONTEXTS) + print("Do you want to save the tuned RAG config?") + valid, save = input_parser(QuestionType.BOOL) + if valid and save: + constructor.save_pipeline_to_json(args.output_config_json) + print("\n\n") + break + + adaptor.update_all_module_functions(get_ecrag_module_map(constructor.pl), COMP_TYPE_MAP) + is_changed = False + for tuner in tuners: + # Tuner setup + module = adaptor.get_module(tuner.node_type, tuner.module_type) + tuner.update_module(module) + tuner.init_ragqna(RagResult(**ragqna.model_dump())) + + # Tuning + if tuner.request_feedback(): + if tuner.make_suggestions(): + is_changed = True + break + + if not is_changed: + print("Nothing changed, tune again?") + valid, cont = input_parser(QuestionType.BOOL) + if valid and not cont: + break + else: + if mode == Mode.ONLINE: + print("Updating pipeline . . .") + pipeline = constructor.export_pipeline() + update_active_pipeline(pipeline) + else: + print("\n") + constructor.save_pipeline_to_json(args.output_config_json) + print( + f"\n\nDo you want to continue with previous RAG pipeline file {args.rag_pipeline_json} and rag_results {args.ragresults_json}?" + ) + valid, cont = input_parser(QuestionType.BOOL) + if valid and not cont: + return + + +if __name__ == "__main__": + main() diff --git a/evals/evaluation/rag_pilot/requirements.txt b/evals/evaluation/rag_pilot/requirements.txt new file mode 100644 index 00000000..6a19ff43 --- /dev/null +++ b/evals/evaluation/rag_pilot/requirements.txt @@ -0,0 +1,2 @@ +fastapi>=0.115.0 +opea-comps>=1.2