A production-ready Retrieval-Augmented Generation (RAG) system for intelligent document search and question answering. Answers are strictly grounded in your documents with no hallucination. The current setup focuses on an Azure DevOps dbt project plus local files, with Azure OpenAI for embeddings and generation.
- Zero Hallucination: Answers ONLY from provided documents
- Helpful Guidance: Suggests question rephrasing when information unavailable
- Source Attribution: Clear citation of document sources
- Azure DevOps dbt project
- Uses
dbt_project.yml,target/manifest.json,/macros/, and/data/from a single repository (e.g.DBT-ANTHEM). - DBT artifacts are parsed into separate documents (models, tests, macros, seeds) with rich metadata.
- Path and file type filtering, batch processing, and incremental updates via an ingestion tracker.
- Uses
- Local files (optional)
- Text-like formats (Markdown, TXT, etc.) can be added via
config.yaml.
- Text-like formats (Markdown, TXT, etc.) can be added via
Planned/optional connectors such as Confluence or Jira are not required for the current configuration.
- FastAPI Web Interface: REST API plus HTML UI (via
ui/app.py). - Hierarchical Storage (optional): When enabled, uses LLM to summarize long documents and store both summaries and detailed chunks.
- Hybrid Search: Semantic vector search + keyword matching with multi-query expansion.
- Azure OpenAI Only: Azure is the embedding and LLM provider in this branch.
- Persistent Storage: ChromaDB vector database (
vector_store/) and an ingestion tracker SQLite DB. - Structured Logging: JSON logs under
logs/and user-activity logs underlogs/user_activity/.
- ASCII-Safe: No emoji encoding issues (Windows compatible)
- Domain-Agnostic: Works for any use case (tech, business, finance, etc.)
- Professional Standards: Clear error messages with system codes
- Python 3.8+
- Azure OpenAI API key
- Git (for Azure DevOps integration)
# Clone repository
git clone https://github.com/your-org/RAG-ing.git
cd RAG-ing
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\Activate.ps1 # Windows
# Install dependencies
pip install -e .1. Create .env file:
# Azure OpenAI (Required)
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-05-01-preview
# Azure DevOps (Required for dbt ingestion)
AZURE_DEVOPS_ORG=your_organization
AZURE_DEVOPS_PROJECT=your_project
AZURE_DEVOPS_PAT=your_personal_access_token
AZURE_DEVOPS_REPO=DBT-ANTHEM2. Configure config.yaml:
See Configuration section below for detailed settings.
# Step 1: Index your documents
python main.py --ingest
# Step 2: Launch web interface
python main.py --ui
# Step 3: Access at http://localhost:8000Alternative commands:
# Single query via CLI
python main.py --query "What SQL models exist in the repository?"
# System health check
python main.py --status
# Debug mode
python main.py --ingest --debugQuery your codebase with advanced intelligence:
- "How is authentication implemented?"
- "What SQL models are in the dbt-anthem repository?"
- "When was the avoidable admissions logic last changed?"
- "What files handle data transformation?"
- Commit History: Tracks last N commits for each file (default: 10)
- Smart Filtering: Include/exclude paths and file types
- Batch Processing: Configurable batch size (default: 50 files)
- Incremental Updates: Only processes changed files
- Change Detection: Content hash-based tracking
Generate PAT Token:
- Go to
https://dev.azure.com/{org}/_usersSettings/tokens - Create new token with Code (Read) scope
- Add to
.envfile
Configure in config.yaml:
data_source:
sources:
- type: "azure_devops"
enabled: true
azure_devops:
organization: "${AZURE_DEVOPS_ORG}"
project: "${AZURE_DEVOPS_PROJECT}"
pat_token: "${AZURE_DEVOPS_PAT}"
repo_name: "dbt-anthem"
branch: "develop"
# Path filtering
include_paths:
- "/dbt_anthem/models"
- "/dbt_anthem/macros"
- "/dbt_anthem/tests"
exclude_paths:
- "/dbt_anthem/tests/fixtures"
# File type filtering
include_file_types: [".sql", ".yml", ".py", ".md"]
exclude_file_types: [".gitignore", ".gitkeep"]
# Commit history
fetch_commit_history: true
commits_per_file: 10
# Batch processing
batch_size: 50Run ingestion:
python main.py --ingestThe system will:
- Connect to Azure DevOps
- Fetch files matching filters
- Track last N commits per file
- Process in batches (default: 50 files)
- Create searchable embeddings
- Store in vector database
Place documents in ./data/ directory:
data/
├── documentation.pdf
├── guide.md
├── notes.txt
└── reference.htmlSupported formats: PDF, Markdown, TXT, HTML
Wiki pages and documentation import.
Status: Connector code exists, needs testing and configuration.
Query DBT project metadata, lineage, and SQL code.
Capabilities:
- Lineage Graphs: In-memory graph traversal for model dependencies
- SQL Extraction: Parse manifest.json to extract 1,478+ SQL documents (models, tests, macros)
- Seed Data: CSV reference data with automatic linking to models
- Business Queries: "Does QM2 include J1434 for NK1 high emetic risk?"
Configuration:
azure_devops:
include_paths:
- "/dbt_anthem/target/" # Artifacts (manifest, catalog, graph)
- "/dbt_anthem/dbt_project.yml" # Project config
- "/dbt_anthem/data/" # Seed CSV filesStatus: Core processing complete, streaming configuration pending (30 min setup)
Documentation: See docs/DBT_INTEGRATION_STATUS.md
Ticket descriptions, comments, and requirements.
Status: API integration planned.
# Vector Store
vector_store:
type: "chroma"
path: "./vector_store"
collection_name: "rag_documents" # Generic collection name
# Embedding Model
embedding_model:
provider: "azure_openai"
azure_model: "text-embedding-ada-002"
azure_deployment_name: "text-embedding-ada-002"
# LLM Configuration
llm:
model: "gpt-4o"
provider: "azure_openai"
temperature: 0.1
max_tokens: 4096
prompt_template: "./prompts/general.txt" # Enforces strict grounding
system_instruction: "Answer STRICTLY from context..."
# Retrieval Settings
retrieval:
top_k: 5
strategy: "hybrid" # Semantic + keyword
rerank: true
# UI Settings
ui:
framework: "fastapi"
port: 8000
host: "0.0.0.0"
debug: falseSystem uses strict grounding prompts in prompts/:
general.txt: Default (enforces document-only answers) ← PRIMARYsimple.txt: Minimal styleiconnect_concise.txt: Concise with visual elementsiconnect_enterprise.txt: Detailed explanatory style
All prompts enforce: Answer ONLY from provided context, suggest rephrasing if information unavailable.
┌─────────────────────────────────────────────────┐
│ Module 1: Corpus Embedding │
│ - Multi-source ingestion (Azure DevOps, Local) │
│ - Chunking with configurable strategies │
│ - Azure OpenAI embeddings │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Module 2: Query Retrieval │
│ - Hybrid search (semantic + keyword) │
│ - Metadata filtering │
│ - Result reranking │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Module 3: LLM Orchestration │
│ - Azure OpenAI GPT-4/GPT-4o │
│ - Fallback providers (OpenAI, Anthropic) │
│ - Strict grounding enforcement │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Module 4: UI Layer (FastAPI) │
│ - REST API endpoints │
│ - Server-Sent Events for progress │
│ - HTML templates + static assets │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Module 5: Evaluation & Logging │
│ - Structured JSON logs │
│ - Performance metrics │
│ - Query/response tracking │
└─────────────────────────────────────────────────┘
RAG-ing/
├── main.py # CLI entry point
├── config.yaml # System configuration (SINGLE SOURCE OF TRUTH)
├── .env # API credentials (create this)
│
├── src/rag_ing/ # Core application
│ ├── orchestrator.py # Coordinates all modules
│ ├── modules/ # Five core modules
│ │ ├── corpus_embedding.py # Module 1
│ │ ├── query_retrieval.py # Module 2
│ │ ├── llm_orchestration.py # Module 3
│ │ ├── ui_layer.py # Module 4
│ │ └── evaluation_logging.py # Module 5
│ ├── connectors/ # Data source integrations
│ │ ├── azuredevops_connector.py
│ │ └── confluence_connector.py
│ ├── config/ # Settings management
│ └── utils/ # Utilities (tracking, chunking, etc.)
│
├── ui/ # FastAPI web interface
│ ├── app.py # FastAPI application
│ ├── api/ # REST endpoints
│ ├── templates/ # Jinja2 HTML templates
│ └── static/ # CSS, JavaScript
│
├── prompts/ # LLM prompt templates (strict grounding)
├── data/ # Local document storage
├── vector_store/ # ChromaDB persistence
└── logs/ # Structured JSON logs
# Search endpoint
POST /api/search
{
"query": "What SQL models exist?",
"audience": "general" # or "technical"
}
# Search with progress tracking
POST /api/search-with-progress
GET /api/progress/{session_id} # Server-Sent Events
GET /api/result/{session_id}
# System endpoints
GET /api/health
GET /docs # Interactive API documentation (Swagger UI)from src.rag_ing.orchestrator import RAGOrchestrator
from src.rag_ing.config.settings import Settings
# Load configuration
settings = Settings.from_yaml('./config.yaml')
rag = RAGOrchestrator(settings)
# Index documents
rag.ingest_corpus()
# Query the system
result = rag.query_documents(
query="How is data transformation implemented?",
audience="technical"
)
print(result['response'])
print(result['sources'])JSON logs for analysis:
logs/
├── evaluation.jsonl # Query/response events
├── retrieval_metrics.jsonl # Search performance
└── generation_metrics.jsonl # LLM quality metrics
# System health check
python main.py --status
# View logs
tail -f logs/evaluation.jsonlTracks:
- Query latency and throughput
- Vector search performance
- LLM token usage
- Embedding API calls
- Batch processing stats
- Python: 3.8+
- Framework: FastAPI
- AI/ML: Azure OpenAI, LangChain
- Vector DB: ChromaDB
- Frontend: HTML/CSS/JavaScript (vanilla - no framework)
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Code quality
black src/ ui/
flake8 src/ ui/
mypy src/See .github/copilot-instructions.md for comprehensive guidelines:
- NO EMOJIS in production code (encoding issues)
- Comments describe CURRENT state (not history)
- Error messages: polite + system error + solution
- Strict grounding in all LLM prompts
- Generic/domain-agnostic code
# Standard deployment
docker-compose up --build
# Minimal (no persistence)
docker-compose -f docker-compose.minimal.yml up --build
# Using deployment script
./docker/deploy.sh start
./docker/deploy.sh logs
./docker/deploy.sh stop- Azure App Service: Deploy FastAPI application
- Azure OpenAI: Use managed AI service
- Persistent Storage: Mount volumes for
vector_store/anddata/ - Secrets Management: Azure Key Vault for credentials
- Monitoring: Application Insights integration
Core System:
- General-purpose RAG (domain-agnostic)
- Strict document grounding (zero hallucination)
- Azure OpenAI integration (GPT-4/GPT-4o)
- ChromaDB vector storage with hierarchical collections
- FastAPI web interface
- Structured logging
Hierarchical Storage (✅ Complete):
- Two-tier retrieval: summaries for high-level search, chunks for details
- LLM-generated rich summaries with:
- Business context and purpose
- Searchable keywords and topics (10-15 per doc)
- Document type classification
- Technical details (tables, functions, dependencies)
- Type-specific summarization:
- SQL: Business logic, data transformations, key metrics
- Python: Functionality, classes, external dependencies
- YAML: Configuration settings, relationships
- PDF: Key entities, document category, sections
- Smart routing: Top 15 summary candidates → metadata boosting → top 5 detailed results
DBT Artifacts Integration (In Development - Q1 2026):
- DBT Manifest Parser: Parse manifest.json, catalog.json, dbt_project.yml
- Project Detection: Identify DBT projects from folder structure
- Rich Metadata Extraction:
- Model descriptions and documentation
- Column-level lineage and descriptions
- Tags, meta properties, owners
- Dependency graphs (upstream/downstream)
- Test definitions and results
- Project-Aware Filtering:
- Query understanding layer (detect project mentions)
- Metadata-based filtering (project tags)
- Multi-project comparison queries
- Enhanced Search:
- "What is QM2 logic in Anthem project?" (project-scoped)
- "Compare QM1 across EOM, Anthem, and UPMC" (multi-project)
- "Show all models in staging layer" (structural queries)
- Knowledge Graph Integration:
- DBT lineage → graph relationships
- Model-to-model dependencies
- Table-to-column mappings
Enhanced Azure DevOps (Q1 2026):
- Multi-repository support
- Commit history tracking (last N commits per file)
- Path and file type filtering
- Batch processing (configurable size)
- Incremental updates (change detection)
- SQLite-based ingestion tracking
Data Processing:
- Local file ingestion (PDF, MD, TXT, HTML)
- Hybrid search (semantic + keyword)
- Generic domain code extraction (error codes, tickets, versions)
Enhanced Azure DevOps (Q1 2026):
- PR and commit message analysis
- Code diff tracking
- Branch comparison
- Author-based filtering
- Time-range queries ("changes in last 3 months")
Additional Connectors (Q1-Q2 2026):
- Confluence: Live wiki synchronization
- Jira: Ticket and comment indexing
- SharePoint: Document library integration
- GitHub: Repository and PR analysis
Advanced Features (Q2 2026):
- Multi-modal search (images, diagrams)
- Semantic code chunking (function/class aware)
- Caching layer (reduce redundant LLM calls)
- Query suggestions and autocomplete
- Document summarization
- User feedback loop integration
Performance (Q2 2026):
- Async embedding generation
- Parallel batch processing
- Response streaming
- Query result caching
Enterprise Features (Q3 2026):
- User authentication (Azure AD)
- Role-based access control
- Audit logging
- Multi-tenant support
- Custom domain code patterns
- Graph-based RAG for relationship queries
- Fine-tuned embedding models
- Custom chunking strategies per file type
- Automated document refresh scheduling
- Export/import vector store
- A/B testing framework for prompts
- Quick Start: This README
- Developer Guide:
developer_guide.md - AI Agent Instructions:
.github/copilot-instructions.md - Technical Requirements:
src/Requirement.md - Configuration Reference: See
config.yamlcomments
Contributions welcome! Please:
- Follow coding standards in
.github/copilot-instructions.md - No emojis in production code
- Enforce strict grounding in LLM prompts
- Write tests for new features
- Update documentation
MIT License - See LICENSE file for details.
Issues: Open a GitHub issue for bugs or feature requests
Documentation:
- Quick Start: This README
- Developer Guide:
developer_guide.md - API Docs: http://localhost:8000/docs (after starting UI)
Key Files:
- Configuration:
config.yaml - Environment:
.env(create fromenv.example) - Prompts:
prompts/general.txt
Made with ❤️ for developers who want truthful AI answers