Skip to content

valehasadli/doc-pipeline

Repository files navigation

Document Processing Pipeline

🚀 Enterprise-grade document processing pipeline built with TypeScript, Express, and Domain-Driven Design (DDD). Features robust file upload, OCR simulation, validation, and persistence with queue-based processing.

✨ Features

  • 📄 Document Upload & Processing - Multi-format file upload with automatic processing
  • 🔍 OCR Simulation - Text extraction simulation with configurable delays
  • Document Validation - Comprehensive validation pipeline
  • 💾 File Storage - Abstracted storage layer (Local/S3) with retry mechanisms
  • 🔄 Queue Processing - Background job processing with BullMQ
  • 📊 Real-time Status - Track document processing status in real-time
  • 🚫 Cancellation Support - Cancel processing jobs at any stage
  • 🔁 Retry Logic - Automatic retry for failed operations with exponential backoff
  • 🏥 Health Monitoring - Comprehensive health checks and monitoring
  • 🧪 100% Test Coverage - 180 passing tests with zero lint errors

🚀 Quick Start

Prerequisites

  • Node.js 18+ and npm 8+
  • Docker and Docker Compose

1. Install Dependencies

git clone <repository-url>
cd <your_project_name>
npm install

2. Start Services

npm run docker:services  # Start MongoDB & Redis

3. Start Application

npm run start:dev  # Start with hot reload

4. Test the API

# Upload a document
curl -X POST http://localhost:3000/api/documents/upload \
  -F "[email protected]"

# Check status
curl http://localhost:3000/api/documents/{documentId}

# List documents
curl http://localhost:3000/api/documents

📖 API Endpoints

Upload Document

POST /api/documents/upload
Content-Type: multipart/form-data

Upload PDF, DOC, DOCX, PNG, JPG, TXT files (max 10MB)

Get Document Status

GET /api/documents/{documentId}

Get processing status and results

List Documents

GET /api/documents?status=completed&limit=10

List documents with optional filtering

Cancel Processing

POST /api/documents/{documentId}/cancel

Cancel document processing

Retry Failed Document

POST /api/documents/{documentId}/retry

Retry failed document processing

Health Check

GET /health

System health and service status

🧪 Development

Run Tests

npm test                 # All 180 tests
npm run test:watch      # Watch mode

Code Quality

npm run lint            # Check linting
npm run type-check      # TypeScript validation

Docker Commands

npm run docker:services # Start MongoDB & Redis only
npm run docker:up       # Full environment
npm run docker:down     # Stop all services

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published