Skip to content

Orris is an enterprise-grade Document Intelligence Platform for EVIDEV LLP, enabling secure, AI-powered chatbot queries on company documents with Google Drive sync, advanced NLP, and strict access controls.

Notifications You must be signed in to change notification settings

mynkgupta22/orris

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Orris - Enterprise Document Intelligence Platform

Python Version TypeScript Version FastAPI Version Next.js Version Status

πŸ“‹ Table of Contents

πŸš€ Overview

Orris is an enterprise-grade Document Intelligence Platform that transforms how organizations interact with their knowledge base. Orris provides a sophisticated AI-powered chatbot interface that enables users to query company documents while maintaining strict access controls and data security.

The platform automatically synchronizes with Google Drive, processes documents using advanced NLP techniques, and provides contextual responses through a secure RAG (Retrieval-Augmented Generation) system.

🎯 Purpose

Orris serves as an intelligent knowledge management system that:

  • Democratizes Information Access: Enables natural language queries across company documentation
  • Enforces Data Governance: Implements role-based access control with PI (Personal Information) protection
  • Automates Document Processing: Real-time synchronization and intelligent content extraction
  • Enhances Productivity: Instant answers to complex queries with source attribution

✨ Key Features

πŸ” Advanced Authentication & Authorization

  • Multi-provider Authentication: OAuth2 with Google, traditional email/password
  • Role-Based Access Control (RBAC): Granular permissions for document access
  • JWT Token Management: Secure access and refresh token handling
  • Session Monitoring: Real-time user session tracking and security

πŸ“„ Intelligent Document Processing

  • Real-time Synchronization: Google Drive webhook integration for instant updates
  • Multi-format Support: PDF, DOCX, TXT and IMAGES
  • Smart Chunking: Advanced text segmentation for optimal retrieval
  • Vision AI Integration: Image and diagram understanding with Gemini 2.5

🧠 RAG (Retrieval-Augmented Generation) System

  • Vector Search: Qdrant-powered semantic search with BGE-M3 embeddings
  • Contextual Responses: Gemini 2.5/Fin-Llama powered responses with source attribution
  • Access Control: Document-level permissions enforcement
  • Audit Trail: Comprehensive query and response logging

πŸ›‘οΈ Enterprise Security

  • Data Classification: Folder-based PI/Non-PI content identification
  • Secure Processing: Secure temporary storage and processing
  • Compliance Ready: Audit logs and access controls for regulatory compliance
  • Basic Protection: CORS and request validation

πŸ”„ Real-time Operations

  • Webhook Management: Automated Google Drive change notifications
  • Background Processing: Asynchronous document ingestion and updates
  • Health Monitoring: Comprehensive system health checks
  • Auto-recovery: Intelligent error handling and retry mechanisms

πŸ—οΈ Architecture

View the system flow diagram:

System Flow Diagram

System Components

Frontend (Next.js 14)

  • Modern React Interface: TypeScript, Tailwind CSS, Radix UI
  • Real-time Chat: WebSocket-like streaming responses
  • Responsive Design: Mobile-first, accessible UI
  • State Management: Zustand for efficient state handling

Backend (FastAPI)

  • High-Performance API: Async/await, type hints, automatic docs
  • Microservice Architecture: Modular design with clear separation
  • Database Layer: SQLAlchemy with async PostgreSQL
  • Background Tasks: Celery-like task processing

AI/ML Pipeline

  • Embedding Generation: BGE-M3 for multilingual support
  • Vector Storage: Qdrant for high-performance similarity search
  • LLM Integration: Gemini 2.5/Fin-Llama for response generation
  • Document Processing: Unstructured.io for content extraction

πŸ› οΈ Technology Stack

Backend Technologies

Component Technology Version / Notes Purpose
API Framework FastAPI 0.116.x (requirements.txt) High-performance async API
Language Python 3.11+ Core backend development
Database PostgreSQL 14+ Primary data storage
Vector DB Qdrant 1.15.x Semantic search and embeddings
ORM SQLAlchemy 2.x Database abstraction layer
Authentication OAuth2 + JWT Built-in + Google OAuth helpers Secure user authentication
Task Queue FastAPI BackgroundTasks Built-in FastAPI background processing Async processing for webhooks/docs
Web Server Uvicorn 0.35.x ASGI server

Frontend Technologies

Component Technology Version / Notes Purpose
Framework Next.js 15.2.4 (package.json) React-based frontend framework
Language TypeScript 5.x Type-safe development
Styling Tailwind CSS 3.x Utility-first CSS framework
UI Components shadcn / Radix UI Radix primitives + shadcn patterns Accessible component library (components in components/ui)
State Management Zustand ^5 Lightweight state management
Icons / UI helpers Lucide, CVA, clsx - Iconography & variant styling

AI/ML Technologies

Component Technology Purpose
LLM Gemini 2.5/Fin-Llama Response generation
Embeddings BGE-M3 Multilingual text embeddings
Vision AI Gemini 2.5 Image and diagram processing
Text Processing LangChain & unstructured Document processing pipeline
Vector Search Qdrant Semantic similarity search

Infrastructure & DevOps

Component Technology Purpose
Containerization Docker Application containerization
Deployment Frontend: Vercel; Backend: Render Hosting platforms (see Deployment section)
CI/CD GitHub Actions Automated deployment
Monitoring Custom logging / external services Application monitoring and alerting
Security TLS / OAuth / JWT Transport security and auth

πŸ”’ Security & Compliance

Authentication & Authorization

  • Multi-Factor Authentication: OAuth2 with Google + JWT tokens
  • Role-Based Access Control: Three-tier access system (Signed Up, Non-PI, PI Access)
  • Session Management: Secure refresh token rotation
  • API Security: Rate limiting, CORS protection, request validation

Data Protection

  • Classification System: Automatic PI/Non-PI content identification
  • Access Controls: Document-level permissions enforcement
  • Audit Trail: Comprehensive logging of all user interactions
  • Data Encryption: TLS 1.3 for data in transit, encrypted storage

Compliance Features

  • Audit Logging: Complete query and response tracking
  • Access Monitoring: Real-time user activity monitoring
  • Data Governance: Automated compliance reporting
  • Privacy Controls: GDPR-compliant data handling

πŸ“¦ Installation

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 14+
  • Docker & Docker Compose (recommended)
  • Google Cloud Project with Drive API enabled

Quick Start with Docker

# Clone the repository
git clone https://github.com/mynkgupta22/orris.git
cd orris

# Set up environment variables
cp .env.example .env
# Edit .env with your configuration

# Start the entire stack
docker-compose up -d

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Documentation: http://localhost:8000/docs

Manual Installation

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up database
alembic upgrade head

# Start the server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install
# or
pnpm install

# Start development server
npm run dev
# or
pnpm dev

βš™οΈ Configuration

Environment Variables

Backend Configuration (.env)

# Application Settings
APP_NAME="Orris Authentication API"
DEBUG=false
ENVIRONMENT=production

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/orris

# JWT Configuration
JWT_SECRET_KEY=your-super-secure-secret-key
JWT_REFRESH_SECRET_KEY=your-refresh-secret-key
JWT_ALGORITHM=HS512
ACCESS_TOKEN_EXPIRE_MINUTES=15
REFRESH_TOKEN_EXPIRE_DAYS=7

# Google OAuth2
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/callback

# Google Drive Integration
GOOGLE_DRIVE_FOLDER_ID=your-drive-folder-id
GOOGLE_SERVICE_ACCOUNT_PATH=/path/to/service-account.json
EVIDEV_DATA_FOLDER_ID=your-evidev-folder-id

# Google AI/Gemini
GOOGLE_AI_API_KEY=your-google-ai-api-key
GEMINI_MODEL=gemini-2.5-flash
FIN_LLAMA_ENDPOINT=your-fin-llama-endpoint

# Vector Database
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION_NAME=orris_rag

# Embeddings
EMBEDDING_MODEL_NAME=BAAI/bge-m3
EMBED_BATCH_SIZE=8

# Document Processing
CHUNK_SIZE=800
CHUNK_OVERLAP=50
TEMP_DIR=/tmp

# Webhook Configuration
WEBHOOK_BASE_URL=https://your-domain.com
GOOGLE_WEBHOOK_TOKEN=your-webhook-token
GDRIVE_ROOT_ID=your-root-folder-id

# CORS
ALLOWED_ORIGINS=http://localhost:3000,https://your-domain.com

# Rate Limiting
RATE_LIMIT_PER_MINUTE=60

Frontend Configuration (.env.local)

# API Configuration
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your-google-client-id

# Environment
NODE_ENV=production

Google Drive API Setup

  1. Create Google Cloud Project

    # Visit Google Cloud Console
    # Create new project or select existing
    # Enable Google Drive API
  2. Set up Service Account

    # Create service account
    # Download JSON credentials
    # Share target Drive folder with service account email
  3. Configure OAuth2

    # Set up OAuth2 credentials
    # Add authorized redirect URIs
    # Copy client ID and secret

Database Setup

# Create database
createdb orris

# Run migrations
cd backend
alembic upgrade head

# Create initial admin user (optional)
python scripts/create_admin.py

Vector Database Setup

# Start Qdrant with Docker
docker run -p 6333:6333 qdrant/qdrant

# Or install locally
# Follow Qdrant installation guide

πŸš€ Deployment

Production Deployment

Backend (Render)

Render is used to host the backend service. You can deploy the backend/ either as a Docker service or directly from the repository.

Quick steps (Docker-based render service):

  1. Create a new Web Service in Render and connect your GitHub/GitLab repository.
  2. Set the root or Dockerfile to ./backend/Dockerfile and choose the Docker plan.
  3. Add environment variables in the Render dashboard (DATABASE_URL, JWT secrets, GOOGLE_* keys, QDRANT config, etc.).
  4. Set the start command to:
uvicorn main:app --host 0.0.0.0 --port $PORT
  1. Run database migrations (one-off job or from Render Shell):
alembic upgrade head

Notes:

  • Use Render Environment settings to securely store secrets.
  • For scheduled jobs (webhook renewal, index optimization), create Render Cron Jobs or one-off Jobs as needed.

Frontend (Vercel - Recommended)

# Install Vercel CLI
npm i -g vercel

# Deploy
cd frontend
vercel

# Configure environment variables in Vercel dashboard

Full Stack with Docker Compose

# docker-compose.prod.yml
version: '3.8'
services:
  backend:
    build: ./backend
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/orris
    depends_on:
      - db
      - qdrant

  frontend:
    build: ./frontend
    environment:
      - NEXT_PUBLIC_API_BASE_URL=https://api.yourdomain.com
    ports:
      - "3000:3000"

  db:
    image: postgres:14
    environment:
      POSTGRES_DB: orris
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    volumes:
      - postgres_data:/var/lib/postgresql/data

  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

volumes:
  postgres_data:
  qdrant_data:

Monitoring & Health Checks

# Health check endpoints
GET /health              # Basic health check
GET /health/detailed     # Detailed system status
GET /webhooks/status     # Webhook system status

πŸ“š API Documentation

Authentication Endpoints

POST /auth/signup       # User registration
POST /auth/login        # User login
POST /auth/google       # Google OAuth login
POST /auth/refresh      # Token refresh
POST /auth/logout       # User logout

User Management

GET  /users/me          # Get current user profile
PUT  /users/me          # Update user profile
GET  /users/me/sessions # Get user sessions

RAG System

POST /rag/query         # Submit query to RAG system
GET  /rag/sessions      # Get chat sessions
POST /rag/sessions      # Create new chat session
GET  /rag/sessions/{id} # Get specific session

Webhook Management

POST /webhooks/google-drive  # Google Drive webhook endpoint
GET  /webhooks/status        # Webhook system status

Interactive API Documentation

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

πŸ”§ Development

Development Setup

# Backend development
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload

# Frontend development
cd frontend
npm install
npm run dev

# Run tests
cd backend && pytest
cd frontend && npm test

Code Quality

# Backend linting
flake8 app/
black app/
isort app/

# Frontend linting
npm run lint
npm run type-check

Database Migrations

# Create migration
alembic revision --autogenerate -m "Description"

# Apply migrations
alembic upgrade head

# Rollback
alembic downgrade -1

Testing

# Backend tests
pytest app/tests/

# Frontend tests
npm run test

# Integration tests
npm run test:e2e

πŸ“Š Monitoring & Logging

Application Monitoring

  • Health Checks: Automated system health monitoring
  • Performance Metrics: Response time and throughput tracking
  • Error Tracking: Comprehensive error logging and alerting
  • Usage Analytics: User interaction and query analytics

Logging Structure

# Log levels and categories
INFO  - Normal operations
WARN  - Potential issues
ERROR - System errors
DEBUG - Development debugging

# Log categories
auth.*     - Authentication events
rag.*      - RAG system operations
webhook.*  - Webhook processing
sync.*     - Document synchronization

Key Metrics

  • Query Response Time: Average response time for RAG queries
  • Document Sync Rate: Documents processed per hour
  • User Engagement: Active users and session duration
  • System Health: Database connections, memory usage, CPU

πŸ“ž Support

Getting Help

  • Documentation: Comprehensive guides and API reference
  • Issues: Report bugs and request features on GitHub
  • Discussions: Community discussions and Q&A

Contact Information

  • Contributor 1: Aditya Sebastian | Email
  • Contributor 2: Mayank Gupta | Email

Built with ❀️ for EVIDEV LLP

Transforming how organizations interact with their knowledge

About

Orris is an enterprise-grade Document Intelligence Platform for EVIDEV LLP, enabling secure, AI-powered chatbot queries on company documents with Google Drive sync, advanced NLP, and strict access controls.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •