🔍 Smart RAG Based Q&A Assistant

AI-powered assistant that answers natural language questions from uploaded PDFs using semantic retrieval, and LLM-based re-ranking. Delivers accurate, explainable responses with relevance scores and direct PDF highlighting for traceability.

📸 Demo

🧠 Features

Feature	Description
📄 PDF Upload	Upload and parse PDF with page-level metadata
🔍 Semantic Search	Embed + retrieve most relevant chunks using similarity scoring
🧠 Re-Ranking	Use LLM to sort top chunks before answering
💬 Chat Support	Ask questions via text or voice
📊 Similarity Scores	View how relevant each chunk is to your query
🔦 PDF Highlighting	See exactly which paragraph the answer came from

🛠️ Tech Stack

Layer	Tools
Frontend	React.js, TailwindCSS, React-PDF, Web Speech API
Backend (Node.js)	Node.js, Express.js, OAuth, JWT
Backend (AI-Engine)	Python, FastAPI , PyMuPDF
LLM	Ollama
Embeddings	all-MiniLM-L6-v2
Vector DB	ChromaDB

⚙️ Project Workflow

A clear separation of responsibilities ensures maintainability and scalability. The project is divided into two major flows:

🛠️ Admin Workflow (Document Management & Embedding)

Step	Description
1️⃣	PDF Upload: Admin uploads PDF files via the dashboard.
2️⃣	PDF Parsing: The system extracts text from each page using `PyMuPDF` .
3️⃣	Text Chunking: Extracted text is split into smaller chunks (with token limits).
4️⃣	Metadata Addition: Each chunk is enriched with metadata: `chunk_index`, `page_number`, `text_start`, `text_end`, `source`, etc.
5️⃣	Embedding Generation: Each chunk is passed through an embedding model (e.g., `all-MiniLM-L6-v2`) and stored in a vector database like `ChromaDB`.
✅	Ready for Querying: Admin-processed files are now available for user interaction.

🙋 User Workflow (Querying via Text or Voice)

Step	Description
1️⃣	File Selection: User selects a specific PDF file to query from the list of uploaded documents.
2️⃣	Input Method: User types a question or uses voice input (handled via `react-speech-recognition`).
3️⃣	Embedding Query: User query is converted into an embedding and searched in the vector database (ChromaDB).
4️⃣	Top-k Retrieval: Most similar chunks are retrieved based on cosine similarity.
5️⃣	Contextual Prompt Construction: Retrieved chunks and metadata are appended to the query for contextual understanding.
6️⃣	Answer Generation: Query is sent to a language model (e.g., GPT-4) along with relevant context for accurate response generation.
7️⃣	Answer Display: Answer is rendered on the UI. Additional metadata like page number and similarity score is also shown.
8️⃣	PDF Viewer Sync (Bonus): The highlighted chunk is shown in the PDF viewer with `react-pdf`.

✅ Bonus Features Implemented

🔎 Similarity Score Display
📄 PDF Viewer with Highlighted Chunks
🎙️ Voice Input Support
🔐 Admin-only Access for Upload & Embedding

📷 Screenshots

Chat Interface	PDF Highlight

Setup Instructions

Clone the Repository

git clone https://github.com/shivamworld0608/Ollabot.git
cd Ollabot

Create a .env file in backend

#OAuth Credentials
GOOGLE_CLIENT_ID=87796226935-nr26lcqfgqfsoepgr3h30qc4nn224lqt.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=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


#JWT Credentials
JWT_SECRET="b1a26c4a14718e4244721cc7db67f6e42befce460c32f7e08f2040cb07ae4ed3"
JWT_EXPIRES_IN="30d"
JWT_COOKIE_EXPIRES_IN=30

#basic server credentials
MONGO_URI="mongodb+srv://pandeyashishivam:[email protected]/"
CLIENT_URL="http://localhost:5173"
AI_ENGINE_URL="http://localhost:8000"
SERVER_URL="http://localhost:5000"
PORT=5000

Create a .env file in frontend

VITE_APP_BASE_URL='http://localhost:5000'

AI-Engine Setup (FastAPI)

cd ai-engine
pip install -r requirements.txt
python main.py

Backend Setup (Nodejs,Express)

 cd backend
 npm i
 npx nodemon server.js

Frontend Setup (React)
```
cd frontend
npm install
npm run dev
```

Make sure to set the correct backend URL in your frontend env and also correct ai-engine url in backend env

🤝 Contributing

We welcome contributions from the community! Here’s how you can help:

📌 Guidelines

📝 Open an Issue
For major features or changes, please open an issue first to discuss your ideas.
📂 Follow Standards
Stick to the existing project structure and naming conventions for consistency.
✅ Test Before Push
Ensure all features are tested and stable before submitting a pull request.

📬 Contact

Have questions, feedback, or just want to connect? Feel free to reach out!

Platform	Link
GitHub	@shivamworld0608
Email	[email protected]
LinkedIn	linkedin.com/in/pandey-shivam-

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
ai-engine		ai-engine
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Smart RAG Based Q&A Assistant

📸 Demo

🧠 Features

🛠️ Tech Stack

⚙️ Project Workflow

🛠️ Admin Workflow (Document Management & Embedding)

🙋 User Workflow (Querying via Text or Voice)

📷 Screenshots

Setup Instructions

🤝 Contributing

📌 Guidelines

📬 Contact

About

Uh oh!

Releases

Packages

Languages

shivamworld0608/Ollabot

Folders and files

Latest commit

History

Repository files navigation

🔍 Smart RAG Based Q&A Assistant

📸 Demo

🧠 Features

🛠️ Tech Stack

⚙️ Project Workflow

🛠️ Admin Workflow (Document Management & Embedding)

🙋 User Workflow (Querying via Text or Voice)

📷 Screenshots

Setup Instructions

🤝 Contributing

📌 Guidelines

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages