You can watch the video version of this tutorial here:
▶ Watch on YouTube
This repository contains a step-by-step tutorial on how to build a local Retrieval-Augmented Generation (RAG) chatbot using:
- TiDB with the
VECTORdata type - Ollama for embeddings and LLM inference
- Python for ingestion and querying scripts
The tutorial shows how to:
- Convert TiDB documentation into a PDF and split it into chunks
- Store the text and embeddings in TiDB
- Query the database with semantic search
- Generate answers with Gemma 12B in a chatbot loop
- This tutorial is intended as a learning project.
- For simplicity, it does not use TiFlash ANN indexes, but mentions them as an option for production.
- Running locally requires a GPU with ≥12GB VRAM or an Apple Silicon Mac.
tidb_docs_pdf.txt→ link to an already generated TiDB documentation in pdfdocs/index.html→ styled HTML version for publishingload.py→ script to split and embed documentation into TiDBchat.py→ chatbot script that retrieves context and answers questions
- Install TiUP and start TiDB Playground.
- Install Ollama and pull the models:
ollama pull nomic-embed-text:v1.5 ollama pull gemma3:12b
- Create a Python virtual environment and install requirements:
python3 -m venv rag_tidb source rag_tidb/bin/activate pip install pymupdf pymysql ollama - Follow the tutorial in
docs/index.html
I’m a database engineer, not an AI expert. This tutorial was created with AI assistance as part of my own learning. There are probably better ways to implement RAG pipelines — feedback and improvements are very welcome!