Building a Retrieval-Augmented Generation pipeline from the ground up — chunking, embedding, vector search, and LLM-powered Q&A with RAGAS evaluation metrics.
Most RAG tutorials use high-level abstractions that hide the mechanics. This project builds every component from scratch so you understand exactly what happens at each stage — from raw text to cited answers.
graph LR
A[PDF / Text Documents] --> B[Document Loader]
B --> C[Text Chunking]
C --> D[Embedding Generation]
D --> E[ChromaDB Vector Store]
F[User Query] --> G[Query Embedding]
G --> H[Similarity Search]
E --> H
H --> I[Context Assembly]
I --> J[LLM Generation]
J --> K[Answer + Sources]
K --> L[RAGAS Evaluation]
| Feature | Description |
|---|---|
| Document Loading | PDF and text file ingestion with metadata extraction |
| Semantic Chunking | Recursive text splitting with configurable overlap |
| Vector Embeddings | OpenAI/HuggingFace embedding models |
| Similarity Search | Cosine similarity over ChromaDB vector store |
| LLM Generation | Context-grounded answer generation with source citations |
| RAGAS Evaluation | Automated metrics — faithfulness, relevance, context precision |
| Flask Chat UI | Interactive web interface for document Q&A |
- Load PDFs and text files using custom parsers
- Extract metadata (filename, page number, section)
- Handle encoding edge cases
- Recursive character text splitter
- Configurable chunk size (default: 1000 tokens) and overlap (200 tokens)
- Preserves paragraph boundaries where possible
- Generate dense vector embeddings (OpenAI
text-embedding-3-smallor HuggingFace alternatives) - Store in ChromaDB with metadata for filtered retrieval
- Persistent storage for production use
- Embed user query → cosine similarity search → top-k retrieval
- Context window assembly with source tracking
- LLM generates grounded answers with inline citations
- Faithfulness: Is the answer supported by retrieved context?
- Answer Relevancy: Does the answer address the question?
- Context Precision: Are the retrieved chunks relevant?
- Context Recall: Did retrieval capture all needed information?
- Python 3.11+
- OpenAI API key
git clone https://github.com/Nagavenkatasai7/rag-from-scratch.git
cd rag-from-scratch/rag-project
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtexport OPENAI_API_KEY="your-key-here"python app.pyOpen http://localhost:5000 to use the chat interface.
rag-from-scratch/
├── rag-project/
│ ├── app.py # Flask chat UI
│ ├── rag_pipeline.py # Core RAG pipeline
│ ├── chunking.py # Text splitting logic
│ ├── embeddings.py # Embedding generation
│ ├── retriever.py # Vector search & retrieval
│ ├── evaluation.py # RAGAS evaluation metrics
│ ├── templates/ # HTML templates for Flask UI
│ └── data/ # Sample documents
├── RAG-FROM-SCRATCH-GUIDE.md # Detailed implementation guide
├── requirements.txt
└── README.md
| Component | Technology |
|---|---|
| Language | Python 3.11+ |
| Framework | LangChain, Flask |
| Vector Store | ChromaDB |
| Embeddings | OpenAI / HuggingFace |
| LLM | GPT-4o / GPT-3.5-turbo |
| Evaluation | RAGAS |
| Frontend | Flask + HTML/CSS |
- How chunking strategy directly impacts retrieval quality
- Why overlap matters — without it, answers miss context at chunk boundaries
- RAGAS evaluation reveals failure modes invisible to manual testing
- ChromaDB's metadata filtering enables precise document-scoped queries
MIT — see LICENSE for details.