Skip to content

Latest commit

 

History

History
116 lines (92 loc) · 3.33 KB

README.md

File metadata and controls

116 lines (92 loc) · 3.33 KB

Logo

Chunkr

Chunkr is a self-hostable API for converting pdf, pptx, docx, and excel files into RAG/LLM ready data
11 semantic tags for layout analysis | OCR + Bounding Boxes | Structured HTML and markdown

Try it out! · Report Bug · Contact

Demo video

Watch our 1-minute demo video

Table of Contents

Docs

https://docs.chunkr.ai

(Super) Quick Start

  1. Go to chunkr.ai
  2. Make an account and copy your API key
  3. Create a task:
    curl -X POST https://api.chunkr.ai/api/v1/task \
       -H "Content-Type: multipart/form-data" \
       -H "Authorization: ${YOUR_API_KEY}" \
       -F "file=@/path/to/your/file" \
       -F "model=HighQuality" \
       -F "target_chunk_length=512" \
       -F "ocr_strategy=Auto"
  4. Poll your created task:
    curl -X GET https://api.chunkr.ai/api/v1/task/${TASK_ID} \
      -H "Authorization: ${YOUR_API_KEY}"

Self-Hosted Deployment Options

Quick Start with Docker Compose

  1. Prerequisites:

  2. Clone the repo:

    git clone https://github.com/lumina-ai-inc/chunkr
    cd chunkr
  3. Copy the example env file

    cp .env.example .env
  4. Start the services

    docker compose up -d
  5. Access the services:

    • Web UI: http://localhost:5173
    • API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU to run.

  1. Stop the services
    docker compose down

Production Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

  1. See our detailed guide at kube/README.md
  2. Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

This project is dual-licensed:

  1. GNU Affero General Public License v3.0 (AGPL-3.0)
  2. Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us