A fully engineered Retrieval‑Augmented Generation (RAG) chatbot built from first principles. The system answers user questions grounded strictly in the textbook:
Human Nutrition (University of Hawai‘i at Mānoa)
PDF: https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf
Below is a demo of the chatbot in action, showcasing its ability to provide accurate, grounded responses based on the textbook content.
- Introduction
- Project Objectives
- System Architecture Overview
- Dataset and Source Material
- Ingestion and Extraction
- Chunking Approaches
- Embedding and Vector Storage
- PostgreSQL + pgvector Setup
- Frontend and Backend
- Setup Instructions (Local Installation)
- Repository Structure
- How Query Processing Works
- Detailed Walkthrough Notebook
- RAG Evaluation
- Future Improvements
- License
This repository implements a simple chatbot that provides grounded responses based on information from the referenced nutrition textbook. The project is intentionally engineered without relying on turnkey RAG abstractions, enabling full visibility and control over:
- Ingestion pipeline
- Chunking logic
- Embedding failure modes
- Vector storage and retrieval
- Response generation
- Build a fully manual RAG implementation end‑to‑end
- Understand how ingestion affects downstream retrieval performance
- Implement multiple chunking strategies and evaluate their impact
- Store embeddings directly in PostgreSQL using pgvector
- Build a minimal full‑stack application (Next.js frontend + backend API)
- Document the entire process in a reproducible notebook
Processing flow:
PDF → Extracted Text
→ Exploratory Data Analysis (token lengths, truncation risks)
→ Chunking (multiple engineering methods)
→ Embedding
→ PostgreSQL + pgvector storage
→ SQL‑based similarity search
→ LLM response generation (grounded output)
The chatbot is grounded on:
Human Nutrition, University of Hawai‘i at Mānoa
Full PDF: https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf
The PDF is stored locally at: data/human_nutrition_text_book.pdf
Different document formats require different extraction pipelines:
- Digital PDFs: PyMuPDF
- Scanned documents: Tesseract OCR
- Hybrid documents (tables, charts, layouts): DOCkling or layout‑aware OCR
Extraction quality directly affects tokenization, chunking behavior, embedding quality, and retrieval recall.
The ingestion pipeline is implemented in: scripts/ingest.py
Six different chunking strategies were implemented and tested:
| Method | Avg Tokens | Notes |
|---|---|---|
| Fixed‑size | ~65 | Predictable, but ignores meaning |
| Structure‑based | ~1342 | Matches chapter hierarchy but exceeds embedding windows |
| Semantic | ~13 | Very coherent but over‑fragmented |
| Recursive | ~89 | Best practical trade‑off |
| LLM‑based | ~92 | High quality, but costly |
| Hybrid | Variable | Combines structure‑awareness with window control |
Key lessons:
- Most real‑world RAG failures originate in chunking and ingestion, not the LLM.
- Without performing dataset‑level EDA, many chunks silently truncate before embedding.
Detailed chunking experiments are documented in: notebooks/rag_chunking_strategies.ipynb
Embeddings are generated using:
all‑mpnet‑base‑v2
Each chunk is embedded and stored directly inside PostgreSQL using the pgvector extension, eliminating the need for external vector databases while enabling efficient similarity search within SQL.
Enable vector support:
create extension if not exists vector;Create table:
create table if not exists public.chunks (
id bigserial primary key,
doc_id text not null,
chunk_index int not null,
content text not null,
metadata jsonb default '{}'::jsonb,
embedding vector(1024)
);Create IVFFlat index:
create index if not exists idx_chunks_embedding
on public.chunks using ivfflat (embedding vector_cosine_ops)
with (lists=100);Similarity search function:
create or replace function public.match_documents(
query_embedding vector(1024),
match_count int default 5,
filter jsonb default '{}'::jsonb
) returns table (
id bigint,
doc_id text,
chunk_index int,
content text,
metadata jsonb,
similarity float
) language plpgsql stable as $$
begin
return query
select
c.id,
c.doc_id,
c.chunk_index,
c.content,
c.metadata,
1 - (c.embedding <=> query_embedding) as similarity
from public.chunks c
where (filter = '{}'::jsonb) or (c.metadata @> filter)
order by (c.embedding <=> query_embedding)
limit match_count;
end;
$$;The application uses:
- Next.js for both UI and backend endpoints
- Groq for securely handling model API keys
- Backend endpoints handle:
- Query embedding
- SQL similarity search
- LLM grounded response generation
Frontend application: rag-chat/
Backend API route: rag-chat/src/app/api/chat/route.ts
Main chat interface: rag-chat/src/app/page.tsx
git clone https://github.com/KushalRegmi61/rag.git
cd ragpython3 -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
pip install -r requirements.txtSee: requirements.txt
Install pgvector:
sudo apt install postgresql-16-pgvectorRun the SQL scripts above to create tables and indexes.
jupyter labOpen: notebooks/production_level_from_scratch.ipynb
Execute fully to:
- Extract book
- Analyze token distribution
- Chunk
- Compute embeddings
- Store in database
cd rag-chat
npm installCreate .env.local:
DATABASE_URL=postgres://...
GROQ_API_KEY=your_api_keyRun the development server:
npm run devAccess the application at: http://localhost:3000
rag/
├── data/
│ └── human_nutrition_text_book.pdf
├── notebooks/
│ ├── production_level_from_scratch.ipynb
│ └── rag_chunking_strategies.ipynb
├── scripts/
│ └── ingest.py
├── test/
│ └── test_embeddings.py
├── rag-chat/
│ ├── src/
│ │ └── app/
│ │ ├── api/
│ │ │ └── chat/
│ │ │ └── route.ts
│ │ ├── page.tsx
│ │ ├── layout.tsx
│ │ └── globals.css
│ ├── public/
│ ├── package.json
│ └── tsconfig.json
├── .venv/
├── requirements.txt
├── package.json
├── .env
├── .gitignore
├── LICENSE
└── README.md
Key files:
data/human_nutrition_text_book.pdf– Source textbooknotebooks/production_level_from_scratch.ipynb– Main implementation notebooknotebooks/rag_chunking_strategies.ipynb– Chunking experimentsscripts/ingest.py– Document ingestion pipelinetest/test_embeddings.py– Embedding testsrag-chat/src/app/api/chat/route.ts– Backend API endpointrag-chat/src/app/page.tsx– Frontend chat interfacerequirements.txt– Python dependenciesLICENSE– MIT License
User query
→ Query embedding
→ SQL vector similarity search
→ Top chunks returned
→ LLM generates grounded output
→ Rendered in chat interface
The complete flow is implemented in:
- Frontend:
rag-chat/src/app/page.tsx - Backend API:
rag-chat/src/app/api/chat/route.ts
All the engineering and reasoning is documented step‑by‑step in:
notebooks/production_level_from_scratch.ipynb
This is the primary reference for the implementation.
Additional experiments and chunking strategy comparisons:
notebooks/rag_chunking_strategies.ipynb
The RAG system was evaluated using the Ragas library. Below are the overall average scores:
| Metric | Description | Value | Remarks |
|---|---|---|---|
| Faithfulness | Measures the factual consistency of the generated answer with respect to the provided context. | 0.200 | Poor factual consistency |
| Answer Relevancy | Evaluates how relevant the generated answer is to the user's original question. | 0.199 | Poor relevance |
| Context Recall | Determines the extent to which all relevant information from the ground truth is retrieved within the context. | 0.900 | Excellent recall |
| Context Precision | Assesses the proportion of retrieved context that is actually relevant to the question. | 1.000 | Excellent precision |
- Retrieval re‑ranking
- Embedding‑quality comparison
- Multi‑vector per chunk scoring
- Structured citations
- Deployment using containerization
This repository is released under the MIT License. See LICENSE for details.
Author: Kushal Regmi
GitHub: https://github.com/KushalRegmi61
Project Repository: https://github.com/KushalRegmi61/rag
