Skip to content

KushalRegmi61/nutritional-rag-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nutrition RAG Chatbot: From Scratch (Manual RAG Engineering)

A fully engineered Retrieval‑Augmented Generation (RAG) chatbot built from first principles. The system answers user questions grounded strictly in the textbook:

Human Nutrition (University of Hawai‘i at Mānoa)

PDF: https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf


Demo

Below is a demo of the chatbot in action, showcasing its ability to provide accurate, grounded responses based on the textbook content.

RAG Chatbot Demo


Table of Contents

  1. Introduction
  2. Project Objectives
  3. System Architecture Overview
  4. Dataset and Source Material
  5. Ingestion and Extraction
  6. Chunking Approaches
  7. Embedding and Vector Storage
  8. PostgreSQL + pgvector Setup
  9. Frontend and Backend
  10. Setup Instructions (Local Installation)
  11. Repository Structure
  12. How Query Processing Works
  13. Detailed Walkthrough Notebook
  14. RAG Evaluation
  15. Future Improvements
  16. License

1. Introduction

This repository implements a simple chatbot that provides grounded responses based on information from the referenced nutrition textbook. The project is intentionally engineered without relying on turnkey RAG abstractions, enabling full visibility and control over:

  • Ingestion pipeline
  • Chunking logic
  • Embedding failure modes
  • Vector storage and retrieval
  • Response generation

2. Project Objectives

  • Build a fully manual RAG implementation end‑to‑end
  • Understand how ingestion affects downstream retrieval performance
  • Implement multiple chunking strategies and evaluate their impact
  • Store embeddings directly in PostgreSQL using pgvector
  • Build a minimal full‑stack application (Next.js frontend + backend API)
  • Document the entire process in a reproducible notebook

3. System Architecture Overview

Processing flow:

PDF → Extracted Text
→ Exploratory Data Analysis (token lengths, truncation risks)
→ Chunking (multiple engineering methods)
→ Embedding
→ PostgreSQL + pgvector storage
→ SQL‑based similarity search
→ LLM response generation (grounded output)

4. Dataset and Source Material

The chatbot is grounded on:

Human Nutrition, University of Hawai‘i at Mānoa

Full PDF: https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf

The PDF is stored locally at: data/human_nutrition_text_book.pdf


5. Ingestion and Extraction

Different document formats require different extraction pipelines:

  • Digital PDFs: PyMuPDF
  • Scanned documents: Tesseract OCR
  • Hybrid documents (tables, charts, layouts): DOCkling or layout‑aware OCR

Extraction quality directly affects tokenization, chunking behavior, embedding quality, and retrieval recall.

The ingestion pipeline is implemented in: scripts/ingest.py


6. Chunking Approaches

Six different chunking strategies were implemented and tested:

Method Avg Tokens Notes
Fixed‑size ~65 Predictable, but ignores meaning
Structure‑based ~1342 Matches chapter hierarchy but exceeds embedding windows
Semantic ~13 Very coherent but over‑fragmented
Recursive ~89 Best practical trade‑off
LLM‑based ~92 High quality, but costly
Hybrid Variable Combines structure‑awareness with window control

Key lessons:

  • Most real‑world RAG failures originate in chunking and ingestion, not the LLM.
  • Without performing dataset‑level EDA, many chunks silently truncate before embedding.

Detailed chunking experiments are documented in: notebooks/rag_chunking_strategies.ipynb


7. Embedding and Vector Storage

Embeddings are generated using:

all‑mpnet‑base‑v2

Each chunk is embedded and stored directly inside PostgreSQL using the pgvector extension, eliminating the need for external vector databases while enabling efficient similarity search within SQL.


8. PostgreSQL + pgvector Setup

Enable vector support:

create extension if not exists vector;

Create table:

create table if not exists public.chunks (
  id bigserial primary key,
  doc_id text not null,
  chunk_index int not null,
  content text not null,
  metadata jsonb default '{}'::jsonb,
  embedding vector(1024)
);

Create IVFFlat index:

create index if not exists idx_chunks_embedding
on public.chunks using ivfflat (embedding vector_cosine_ops)
with (lists=100);

Similarity search function:

create or replace function public.match_documents(
    query_embedding vector(1024),
    match_count int default 5,
    filter jsonb default '{}'::jsonb
) returns table (
  id bigint,
  doc_id text,
  chunk_index int,
  content text,
  metadata jsonb,
  similarity float
) language plpgsql stable as $$
begin
  return query
  select
    c.id,
    c.doc_id,
    c.chunk_index,
    c.content,
    c.metadata,
    1 - (c.embedding <=> query_embedding) as similarity
  from public.chunks c
  where (filter = '{}'::jsonb) or (c.metadata @> filter)
  order by (c.embedding <=> query_embedding)
  limit match_count;
end;
$$;

9. Frontend and Backend

The application uses:

  • Next.js for both UI and backend endpoints
  • Groq for securely handling model API keys
  • Backend endpoints handle:
    • Query embedding
    • SQL similarity search
    • LLM grounded response generation

Frontend application: rag-chat/

Backend API route: rag-chat/src/app/api/chat/route.ts

Main chat interface: rag-chat/src/app/page.tsx


10. Setup Instructions (Local Installation)

Step 1 – Clone

git clone https://github.com/KushalRegmi61/rag.git
cd rag

Step 2 – Python Environment

python3 -m venv .venv
source .venv/bin/activate         # Linux/Mac
.venv\Scripts\activate            # Windows

pip install -r requirements.txt

See: requirements.txt

Step 3 – PostgreSQL + pgvector

Install pgvector:

sudo apt install postgresql-16-pgvector

Run the SQL scripts above to create tables and indexes.

Step 4 – Run Notebook

jupyter lab

Open: notebooks/production_level_from_scratch.ipynb

Execute fully to:

  • Extract book
  • Analyze token distribution
  • Chunk
  • Compute embeddings
  • Store in database

Step 5 – Frontend Installation

cd rag-chat
npm install

Create .env.local:

DATABASE_URL=postgres://...
GROQ_API_KEY=your_api_key

Run the development server:

npm run dev

Access the application at: http://localhost:3000


11. Repository Structure

rag/
├── data/
│   └── human_nutrition_text_book.pdf
├── notebooks/
│   ├── production_level_from_scratch.ipynb
│   └── rag_chunking_strategies.ipynb
├── scripts/
│   └── ingest.py
├── test/
│   └── test_embeddings.py
├── rag-chat/
│   ├── src/
│   │   └── app/
│   │       ├── api/
│   │       │   └── chat/
│   │       │       └── route.ts
│   │       ├── page.tsx
│   │       ├── layout.tsx
│   │       └── globals.css
│   ├── public/
│   ├── package.json
│   └── tsconfig.json
├── .venv/
├── requirements.txt
├── package.json
├── .env
├── .gitignore
├── LICENSE
└── README.md

Key files:

  • data/human_nutrition_text_book.pdf – Source textbook
  • notebooks/production_level_from_scratch.ipynb – Main implementation notebook
  • notebooks/rag_chunking_strategies.ipynb – Chunking experiments
  • scripts/ingest.py – Document ingestion pipeline
  • test/test_embeddings.py – Embedding tests
  • rag-chat/src/app/api/chat/route.ts – Backend API endpoint
  • rag-chat/src/app/page.tsx – Frontend chat interface
  • requirements.txt – Python dependencies
  • LICENSE – MIT License

12. How Query Processing Works

User query
→ Query embedding
→ SQL vector similarity search
→ Top chunks returned
→ LLM generates grounded output
→ Rendered in chat interface

The complete flow is implemented in:

  1. Frontend: rag-chat/src/app/page.tsx
  2. Backend API: rag-chat/src/app/api/chat/route.ts

13. Detailed Walkthrough Notebook

All the engineering and reasoning is documented step‑by‑step in:

notebooks/production_level_from_scratch.ipynb

This is the primary reference for the implementation.

Additional experiments and chunking strategy comparisons:

notebooks/rag_chunking_strategies.ipynb


14. RAG Evaluation

The RAG system was evaluated using the Ragas library. Below are the overall average scores:

Metric Description Value Remarks
Faithfulness Measures the factual consistency of the generated answer with respect to the provided context. 0.200 Poor factual consistency
Answer Relevancy Evaluates how relevant the generated answer is to the user's original question. 0.199 Poor relevance
Context Recall Determines the extent to which all relevant information from the ground truth is retrieved within the context. 0.900 Excellent recall
Context Precision Assesses the proportion of retrieved context that is actually relevant to the question. 1.000 Excellent precision

15. Future Improvements

  • Retrieval re‑ranking
  • Embedding‑quality comparison
  • Multi‑vector per chunk scoring
  • Structured citations
  • Deployment using containerization

16. License

This repository is released under the MIT License. See LICENSE for details.


Author: Kushal Regmi

GitHub: https://github.com/KushalRegmi61

Project Repository: https://github.com/KushalRegmi61/rag

About

A simple RAG chatbot that provides grounded responses based on information from the referenced nutrition textbook.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors