Chunkr is a self-hostable API for converting pdf, pptx, docx, and excel files into RAG/LLM ready data
11 semantic tags for layout analysis | OCR + Bounding Boxes | Structured HTML and markdown
Try it out!
·
Report Bug
·
Contact
- Go to chunkr.ai
- Make an account and copy your API key
- Create a task:
curl -X POST https://api.chunkr.ai/api/v1/task \ -H "Content-Type: multipart/form-data" \ -H "Authorization: ${YOUR_API_KEY}" \ -F "file=@/path/to/your/file" \ -F "model=HighQuality" \ -F "target_chunk_length=512" \ -F "ocr_strategy=Auto"
- Poll your created task:
curl -X GET https://api.chunkr.ai/api/v1/task/${TASK_ID} \ -H "Authorization: ${YOUR_API_KEY}"
-
Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support)
-
Clone the repo:
git clone https://github.com/lumina-ai-inc/chunkr cd chunkr
-
Copy the example env file
cp .env.example .env
-
Start the services
docker compose up -d
-
Access the services:
- Web UI:
http://localhost:5173
- API:
http://localhost:8000
- Web UI:
Note: Requires an NVIDIA CUDA GPU to run.
- Stop the services
docker compose down
For production environments, we provide a Helm chart and detailed deployment instructions:
- See our detailed guide at
kube/README.md
- Includes configurations for high availability and scaling
For enterprise support and deployment assistance, contact us.
This project is dual-licensed:
- GNU Affero General Public License v3.0 (AGPL-3.0)
- Commercial License
To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.
- 📧 Email: [email protected]
- 📅 Schedule a call: Book a 30-minute meeting
- 🌐 Visit our website: chunkr.ai