Template for building a High-Accuracy Retrieval-Augmented Generation (RAG) pipelines with hybrid search (semantic + keyword), reranking, and LLM-based generation.
- The frontend here, represented by LINE, is just an example
- Hybrid Search (Stage 1 -> Get Top 100)
- OpenAI Embedding Model (Semantic search):
text-embedding-3-large
(80% Search) - Bm25 (Keyword search):
weaviate gse (jieba)
(20% Search)
- OpenAI Embedding Model (Semantic search):
- Voyage Reranker (Stage 2 -> Get Top 5)
- Use text_splitter to chunk data with an excessive number of tokens
- Split into segments of 2000 tokens with an overlap of 500 tokens
- Justin's Open Source LLM Deployment and Training Tutorial
- After completing the LLM training and deploying it using the deployment method provided in the repository, you can create your model's
tem.py
file in thesrc/utils/ai
path of this repository. This will allow you to invoke it incall_ai.py
.
.
├── README.md
├── config.ini
├── llm_finetuning (submodule)
├── data
│ └── README.md
├── docs
│ ├── README.md
│ └── system_diagram.png
├── logs
│ ├── Logger.log
│ └── PLACEHOLDER
├── requirements.txt
└── src
├── README.md
├── db_insert.py
├── rag_service.py
└── utils
├── __init__.py
├── ai
│ ├── __init__.py
│ ├── call_ai.py
│ ├── gemini_tem.py
│ └── gpt_tem.py
├── config_log.py
├── embedded_weaviate_service.py
└── weaviate_op.py
To set up the development environment, follow these steps:
-
git clone our repo:
git clone https://github.com/JustinHsu1019/general-rag-template cd general-rag-template
-
Create a virtual environment:
python3 -m venv rag_venv source rag_venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Copy the configuration example and create your own config file:
cp config.ini config_real.ini
-
Manually add your
secret key
to theconfig_real.ini
:- The api_key for [OpenAI] can be obtained by registering on the OpenAI official website.
- The api_key for [VoyageAI] can be obtained by registering on the VoyageAI official website.
- The password for [Api_docs] can be freely set by the user.
- After starting
rag_service.py
, you can directly visit http://127.0.0.1:5000/ to view the Swagger API documentation page.
- After starting
- For [Gemini], if you choose OpenAI as the LLM model, you do not need to configure any value (set values only if you choose to use Google Gemini).
- For [Weaviate], you only need to set the
class_name
. You can assign any name, similar to a Collection Name in Firebase.
-
Insert TXT Data into the Database
Place the required
.txt
file into the/data
directory and modify thesrc/db_insert.py
script as follows:if __name__ == '__main__': manager = WeaviateManager(config.get('Weaviate', 'class_name')) # Change 'File_6328.txt' to your desired file name with open('data/File_6328.txt', encoding='utf-8') as file: content = file.read() # Automatically split the text into chunks of 2000 tokens each, with a possible overlap of 500 tokens # (avoiding unintended sentence truncation). If the text is already pre-split, you can skip this step # and directly create a List (datas variable) for insertion. text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=500) datas = text_splitter.split_text(content) for lines in datas: manager.insert_data(lines)
-
Insert Data into Weaviate
Run the following command to insert data into the Weaviate database. The Weaviate DB will start automatically with this script and shut down after the process completes. Re-running
rag_service.py
will restart the database. There is no need to use Docker or other tools to open/create the database.python3 src/db_insert.py
-
Retrieval in the Current Workflow
Currently, the retrieval process employs Stage-1 for Hybrid Search and Stage-2 for Reranker. Refer to
src/utils/weaviate_op.py
:def search_do(input_): # Control the number of top-scoring results retrieved by Hybrid Search (default is 100) HYBRID_SEARCH_NUM = 100 # Control the number of results post-reranking to include in the Prompt for LLM (default is 5) RERANKER_NUM = 5 # Search strategy: 80% Vector Search, 20% Keyword Search # (Based on empirical results, an 8:2 ratio provides optimal results) alp = 0.8
-
Run the Flask App To run the service, execute the following command.
- Access the API Docs at
/
with the usernamerag
and the password configured inconfig_real.ini
. - Use
/api/chat/
as the endpoint for the service API.
python3 src/rag_service.py
- Access the API Docs at
-
Prompt Modification
Modify the prompt insrc/utils/ai/call_ai.py
. This script acts as an abstraction layer for calling LLMs. Choose between GPT or Gemini models (default is GPT). -
LLM Third-Party API Calls
- GPT:
src/utils/ai/gpt_tem.py
- Gemini:
src/utils/ai/gemini_tem.py
- GPT:
-
Database Initialization and Data Insertion
Handle database initialization and insert data viasrc/db_insert.py
. By default, the database contains two fields:uuid
andcontent
(modifiable). Refer to the Weaviate v4 Docs - Objects for details. -
Data Retrieval and Re-ranking
Hybrid Search for retrieval and Voyage Reranker for re-ranking are handled insrc/utils/weaviate_op.py
. Focus on thesearch_do()
function to adjust:- The number of results retrieved.
- The balance between semantic and keyword search (Hybrid Search ratio).
Refer to the Weaviate v4 Docs - Hybrid Search for detailed explanations.
Contributions are welcome! Please feel free to submit pull requests, create issues, or suggest improvements. For any questions, please create an issue in the repository.