Intelligent Document Q&A System with Hierarchical RAG, Fact Verification & Self-Learning Capabilities
Transform any document collection into an intelligent, context-aware knowledge base that learns and improves over time.
DocuMind is a production-ready Retrieval-Augmented Generation (RAG) system that combines hierarchical document processing, neural re-ranking, persistent conversational memory, automated fact verification, and research paper integration. Built for both privacy-conscious local deployment and cloud-enhanced self-learning scenarios.
| Feature | Description |
|---|---|
| ποΈ Hierarchical RAG Pipeline | Multi-level document chunking (2048β512β128 tokens) with auto-merging retrieval |
| π¬ Persistent Conversational Memory | Sessions survive restarts with intelligent summarization |
| π Gemini Fact Verification | Periodic cross-validation to detect conflicts and outdated information |
| π Arxiv Research Integration | Automatic paper fetching to supplement answers with latest research |
| πΎ Semantic Query Caching | Embedding-based similarity matching for instant responses |
| π Built-in Analytics Dashboard | Track query patterns, source usage, and response quality |
| π Dual Operation Modes | Privacy-focused local mode vs feature-rich self-learning mode |
- Hierarchical Chunking: Three-tier parsing preserves document structure and context hierarchy
- Auto-Merging Retrieval: Dynamically combines related chunks when accessed together
- Neural Re-ranking: Cross-encoder ensures only the most relevant passages reach the LLM
- GPU Acceleration: CUDA-optimized embeddings and reranking for sub-second processing
- Persistent Chat History: All conversations saved to disk β resume any session
- Automatic Summarization: Older messages condensed into rolling summaries
- Context-Aware Responses: Follow-up questions understand previous conversation
- Session Management: List, resume, export, or clear sessions via commands
- Gemini Fact Verification: Every 5 queries, cross-validates answers against Gemini
- Conflict Detection: Identifies contradictions between documents and current knowledge
- Dynamic Database Updates: Apply corrections and new knowledge directly to vector store
- Arxiv Integration: Fetches relevant research papers and can add them to knowledge base
- Semantic Query Cache: Similar questions (>92% similarity) return cached responses
- Gemini Response Cache: Verification results cached to reduce API calls
- Query Analytics Dashboard: Track usage patterns, cache hit rates, and ratings
- Feedback Collection: Rate responses 1-5 stars to track quality over time
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Document Ingestion β
β PDF/Text β Hierarchical Parser (2048β512β128) β ChromaDB + HNSW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Processing Pipeline β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
β β Semantic βββΆβ Auto-Merge βββΆβ Neural βββΆβ LM Studio β β
β β Cache β β Retriever β β Reranker β β (Qwen3-4B) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Enhancement Layer β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
β β Chat β β Gemini β β Arxiv β β Analytics β β
β β History β β Verifier β β Fetcher β β Dashboard β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Privacy-first, fully offline operation
- All processing happens locally on your machine
- No data transmitted to external services
- Chat history and caching still fully functional
- Ideal for sensitive or confidential documents
Enhanced intelligence with cloud capabilities
- Gemini-powered fact verification every 5 queries
- Automatic Arxiv paper recommendations and integration
- Dynamic knowledge base updates from verified sources
- Continuous accuracy improvement over time
Switch between modes instantly with the /mode command.
| Component | Technology | Purpose |
|---|---|---|
| RAG Framework | LlamaIndex 0.14+ | Orchestration, retrieval, and query processing |
| Vector Database | ChromaDB | Persistent embedding storage with HNSW indexing |
| Local LLM | LM Studio + Qwen3-4B-Thinking | Local inference with reasoning capabilities |
| Embeddings | all-mpnet-base-v2 | Semantic text encoding (768 dimensions) |
| Reranker | MS-MARCO-MiniLM-L-6-v2 | Cross-encoder relevance scoring |
| Fact Verification | Google Gemini 1.5 Flash | Cloud-based knowledge validation |
| Research Papers | Arxiv API | Academic paper retrieval (no API key needed) |
| GPU Acceleration | CUDA 12.x | Parallel embedding and reranking |
| Component | Specification |
|---|---|
| GPU | NVIDIA RTX 4050 Laptop (6GB VRAM) |
| CPU | Intel Core i5-13500H |
| RAM | 16GB DDR5 |
| OS | Windows 11 |
- Python 3.10+
- CUDA-compatible GPU (6GB+ VRAM recommended)
- 16GB+ RAM
- LM Studio with any 4B+ parameter model
- (Optional) Google Gemini API key for fact verification
git clone https://github.com/iDheer/DocuMind.git
cd DocuMind
python -m venv environment
# Windows
.\environment\Scripts\activate
# Linux/Mac
source environment/bin/activate
pip install -r requirements.txt- Download LM Studio from https://lmstudio.ai/
- Search for
qwen/qwen3-4b-thinking-2507(or any 4B+ model) - Download the model (GGUF format, ~3GB)
- Go to Local Server tab β Start Server on port 1234
Create a .env file for fact verification:
GEMINI_API_KEY=your_api_key_hereGet a free API key at: https://aistudio.google.com/apikey
# Place PDF/text documents in data_large/ folder
python 1_build_database_advanced.pypython query_enhanced.py| Command | Description |
|---|---|
/help |
Display all available commands |
/mode |
Switch between Local and Self-Learning modes |
/history |
View current session's conversation history |
/clear |
Clear chat history and start fresh |
/sessions |
List all saved sessions with timestamps |
/stats |
Display system statistics and feature status |
/cache |
View cache statistics or clear cache |
/verify |
Manually trigger Gemini fact verification |
/arxiv |
Toggle automatic Arxiv paper fetching |
/analytics |
View detailed usage analytics dashboard |
/feedback |
Toggle post-response feedback collection |
/rate |
Rate the last response (1-5 stars) |
/export |
Export current session to JSON file |
exit |
Save session and exit |
DocuMind/
βββ query_enhanced.py # π Main application entry point
βββ 1_build_database_advanced.py # π¦ Document processing & indexing
βββ 2_query_system_advanced.py # π Basic query interface (standalone)
βββ 3_inspect_hierarchy.py # π¬ Database inspection utility
β
βββ config.py # βοΈ Centralized configuration
βββ chat_history.py # π¬ Conversation persistence & summarization
βββ gemini_verifier.py # β
Fact verification engine
βββ arxiv_fetcher.py # π Research paper integration
βββ db_updater.py # π Dynamic database modification
βββ cache_manager.py # πΎ Multi-level caching system
βββ analytics.py # π Usage tracking & metrics
β
βββ requirements.txt # π Python dependencies
βββ .env.example # π API key template
βββ data_large/ # π Input documents (user-provided)
βββ chroma_db_advanced/ # ποΈ Vector database (auto-generated)
βββ chat_history/ # π¬ Saved sessions (auto-generated)
βββ arxiv_cache/ # π Cached papers (auto-generated)
============================================================
π¬ Ready to Query! π§ SELF-LEARNING MODE
============================================================
β Question: What is the difference between a process and a thread?
π€ Response: A process is an independent execution unit with its own
memory space, while a thread is a lightweight execution unit within
a process that shares memory with other threads. The OS schedules
processes independently, but threads within the same process share
resources like file handles and heap memory...
π Sources:
1. operating_systems.pdf (score: 0.924)
2. concurrency_chapter.pdf (score: 0.891)
β Question: What are the latest research papers on scheduling?
π [Arxiv: Matched keywords: latest, research, scheduling]
π Searching Arxiv...
β
Found 3 relevant papers
π Related Research Papers:
==================================================
1. π Efficient Task Scheduling for Edge Computing
Authors: Zhang et al. (2024)
π https://arxiv.org/abs/2401.xxxxx
...
π Add these papers to the knowledge base? (y/n): y
β
Papers added to vector database!
Documents are parsed into three levels for optimal retrieval:
- Level 1 (2048 tokens): Major sections β preserves high-level structure
- Level 2 (512 tokens): Paragraphs β captures topic coherence
- Level 3 (128 tokens): Sentences β enables precise retrieval
The auto-merging retriever dynamically combines chunks when related information spans multiple nodes.
- Compute embedding for incoming query using all-mpnet-base-v2
- Compare against cached query embeddings using cosine similarity
- If similarity > 92%, return cached response instantly
- Otherwise, execute full RAG pipeline and cache result
- Collect last 5 Q&A pairs from current session
- Send to Gemini 1.5 Flash with structured verification prompt
- Parse response for accuracy scores and detected conflicts
- Optionally apply corrections to vector database
- Extract key terms from user query (filtering stop words)
- Build Arxiv API query with category filters (cs.OS, cs.DC, etc.)
- Fetch and display relevant papers with abstracts
- Optionally add paper abstracts to vector database for future queries
Key settings in config.py:
# Operation Mode ("local" or "self-learning")
OPERATION_MODE = "self-learning"
# LM Studio Configuration
LM_STUDIO_BASE_URL = "http://localhost:1234/v1"
LM_STUDIO_MODEL = "qwen3-4b-thinking-2507"
# Verification Settings
VERIFICATION_QUERY_INTERVAL = 5 # Queries between Gemini checks
# Caching
SEMANTIC_SIMILARITY_THRESHOLD = 0.92 # Cache hit threshold
# Arxiv Categories
ARXIV_CATEGORIES = ["cs.OS", "cs.DC", "cs.PF", "cs.AR", "cs.NI"]MIT License β See LICENSE for details.
Built with these excellent open-source projects:
- LlamaIndex β RAG framework
- ChromaDB β Vector database
- LM Studio β Local LLM runtime
- HuggingFace Transformers β Embeddings & reranking
- Google Gemini β Fact verification
- Arxiv β Research paper access
β Star this repo if you find it useful! β
Intelligent Document Understanding β’ Conversational Memory β’ Self-Learning Knowledge Base
Quick Start β’ Features β’ Commands
Made with β€οΈ by iDheer