JioPay Customer Support RAG Chatbot

A Retrieval-Augmented Generation (RAG) chatbot that automates customer support for JioPay using publicly available information from their website and FAQs.

Overview

This project implements a conversational AI assistant that can answer questions about JioPay's products and services by retrieving relevant information from a knowledge base built from JioPay's public documentation. The system uses RAG to ensure accurate responses grounded in factual information rather than hallucinated content.

Data Gathering and Preparation

Data Sources

The system uses several data sources from JioPay's public information:

JioPay Help Center FAQs - Comprehensive FAQs organized by section (JioPay Business App, Dashboard, Collect link, User Management, etc.)
JioPay Website Content - General information from the JioPay Business website, including product descriptions, features, and benefits
Specialized FAQs - Domain-specific questions and answers about features like the VoiceBox, Repeat billing, Settlement processes, etc.

Data Collection Process

Data was collected using two main approaches:

Web Scraping: Python scripts using Selenium and BeautifulSoup were developed to scrape content from the JioPay Business website:
- js_faq_scraper.py: Specialized scraper for the FAQ section with tailored selectors
- improved_jio_pay_scraper.py: Generic scraper for extracting content from various JioPay webpages including FAQs on multiple pages
- run_jiopay_scraper.py: Orchestration script that runs both scrapers sequentially
Manual Extraction: Some FAQ content was manually curated to ensure quality and relevance.

Data Processing

The collected data was processed and stored in JSON format:

jiopay_help_center_faqs.json: Comprehensive FAQs from the Help Center
jiopay_links_content.json: Content from various website pages with metadata

Tools and Technologies

Core Technologies

Python 3.9+: Primary programming language
LangChain: Framework for building LLM-powered applications
FAISS: Vector database for efficient similarity search
HuggingFace Embeddings: Using BAAI/bge-base-en-v1.5 for high-quality embeddings
Ollama: Local LLM serving for inference with Llama 3
Gradio: Web interface for the chatbot

Key Dependencies

langchain
langchain_core
langchain_ollama
faiss-cpu
sentence-transformers
huggingface_hub
numpy
pandas
gradio

Web Scraping Tools

Selenium: Browser automation for dynamic content
BeautifulSoup: HTML parsing and content extraction
Chrome WebDriver: Headless browser for scraping

RAG Implementation

Knowledge Base Construction

The knowledge base is constructed through the following process:

Document Processing:
- JSON files are loaded and converted to LangChain Document objects
- FAQ content is processed differently from general website content
- Metadata is preserved (source, section, title) for better context and citation
Semantic Document Creation:
- FAQ Documents: Instead of generic text splitting, we use semantic chunking that preserves the question-answer relationship
- Alternative Document Formats: For each FAQ, we create multiple document representations to enhance retrieval:
  - Standard Q&A format
  - Question-focused format (optimized for direct queries)
  - Section-contextualized format (for topic-based retrieval)
- Section Overviews: Created to help with broad topic questions
Vector Embeddings:
- Model: BAAI/bge-base-en-v1.5 (optimized for retrieval tasks)
- Embeddings are normalized for better similarity calculation
- Device acceleration used when available (MPS on Apple Silicon)

Vector Store

FAISS Index: Efficient similarity search for embedding vectors
Local persistence to disk for reuse across sessions

Advanced Retrieval Techniques

The system implements several advanced retrieval techniques to improve accuracy and relevance:

1. LLM-Based Query Expansion

Rather than using hardcoded heuristics, the system uses the LLM itself to generate better search queries:

Query Refinement: The original user question is sent to the LLM to generate 3-4 alternative formulations
Diverse Phrasing: The LLM creates variations focusing on different aspects and terminology
Keyword Extraction: Technical terms and product names are preserved in the refined queries
Fallback Mechanism: System gracefully falls back to the original query if LLM refinement fails

Benefits:

Adapts to new product terminology automatically
Generates semantically related terms humans might miss
Balances specificity and generality in search

2. Semantic Document Chunking

Instead of raw text splitting which breaks question-answer pairs, the system:

Preserves structured JSON content in semantically meaningful units
Creates multiple document variants for each FAQ with different formats
Maintains metadata relationships between questions, answers, and sections
Includes section context documents for hierarchical understanding
Generates section overview documents for broad topic questions

3. Maximum Marginal Relevance (MMR) Retrieval

To balance relevance with diversity in search results:

MMR Algorithm: Retrieves a larger candidate set, then selects a diverse subset
Configurable Parameters:
- k: Number of documents to return (typically 4-5)
- fetch_k: Initial candidate pool size (typically 10-15)
- lambda_mult: Relevance-diversity tradeoff (0.7 balances both)
Fallback to Similarity: Automatically uses standard similarity search if MMR fails

Benefits:

Reduces redundancy in retrieval results
Ensures broader coverage of relevant topics
Adapts to both specific and general questions

Project Structure

jiopay-support-rag/
├── data/
│   ├── jiopay_help_center_faqs.json
│   └── jiopay_links_content.json
├── scrapers/
│   ├── js_faq_scraper.py
│   ├── improved_jio_pay_scraper.py
│   └── run_jiopay_scraper.py
├── faiss_index/
│   ├── index.faiss
│   └── index.pkl
├── jiopay_support_rag.py
├── requirements.txt
└── README.md

Setup and Installation

Clone the repository:

git clone https://github.com/username/jiopay-support-rag.git
cd jiopay-support-rag

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Ollama:
- Follow instructions at ollama.ai
- Pull the Llama 3 model:
```
ollama pull llama3.3:latest
```
Run the application:
```
python jiopay_support_rag.py
```

Usage

Once launched, the application will start a Gradio web interface at http://localhost:7860
Type your JioPay-related questions in the chat interface
The system will retrieve relevant information and generate a response
Sample questions are provided as examples in the interface

Future Enhancements

Multilingual Support: Add support for Indian regional languages
Hybrid Search: Combine vector similarity with keyword-based search
Response Citations: Add direct links to source documents
Conversation History: Enhance with memory of previous interactions
Feedback Loop: Add user feedback mechanism to improve responses

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
jiopay_data		jiopay_data
scrapers		scrapers
README.md		README.md
faq_data.json		faq_data.json
jiopay_support_rag.py		jiopay_support_rag.py
requirements.txt		requirements.txt
simple_jio_pay_scraper.py		simple_jio_pay_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JioPay Customer Support RAG Chatbot

Overview

Table of Contents

Data Gathering and Preparation

Data Sources

Data Collection Process

Data Processing

Tools and Technologies

Core Technologies

Key Dependencies

Web Scraping Tools

RAG Implementation

Knowledge Base Construction

Vector Store

Advanced Retrieval Techniques

1. LLM-Based Query Expansion

2. Semantic Document Chunking

3. Maximum Marginal Relevance (MMR) Retrieval

Project Structure

Setup and Installation

Usage

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mpnikhil/JioPayRag

Folders and files

Latest commit

History

Repository files navigation

JioPay Customer Support RAG Chatbot

Overview

Table of Contents

Data Gathering and Preparation

Data Sources

Data Collection Process

Data Processing

Tools and Technologies

Core Technologies

Key Dependencies

Web Scraping Tools

RAG Implementation

Knowledge Base Construction

Vector Store

Advanced Retrieval Techniques

1. LLM-Based Query Expansion

2. Semantic Document Chunking

3. Maximum Marginal Relevance (MMR) Retrieval

Project Structure

Setup and Installation

Usage

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages