📄 ocrbase is a lightweight, model-agnostic API that standardizes document parsing across visual language models (VLMs).
🪶 Lightweight: Tiny Bun + Elysia service, single container, minimal footprint.
🔌 Model-Agnostic: Point at any supported VLM — GLM-OCR, PaddleOCR-VL — via env vars.
📊 State of the Art: Backed by models scoring ≥94.5 on OmniDocBench v1.5.
💎 Easy to Deploy: One command away from a working OCR API.
/v1/parse— turn a document into text/v1/parse/async— enqueue a parse job/v1/extract— extract structured JSON from a document/v1/extract/async— enqueue an extract job/v1/job/:jobId— inspect parse or extract job status
Both models are state of the art:
- paddleocr — 94.5 on OmniDocBench v1.5
- glmocr — 94.6 on OmniDocBench v1.5
Important
ocrbase does not ship the models — point it at a running inference server:
- paddleocr — set up PaddleOCR-VL
- glmocr — self-host GLM-OCR with vLLM
docker run -d -p 3000:3000 \
-e PADDLEOCR_URL=http://localhost:8190 \
-e GLM_OCR_URL=http://localhost:5002 \
--name ocrbase ghcr.io/ocrbase-hq/ocrbasebun install
bun devIf S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_BUCKET, and S3_ENDPOINT are set, /v1/parse will:
- upload incoming
Fileinputs to S3 - fetch remote document URLs and upload the contents to S3
- upload base64 or data URL payloads to S3
- pass a presigned
GETURL into the selected document model
If those env vars are not set, ocrbase keeps the current direct behavior and sends the original input to the model.
If REDIS_URL and the S3 env vars above are set, queue mode is enabled:
POST /v1/parseuploads or normalizes the input to S3, enqueues a parse job, waits for completion, and returns the normal parse responsePOST /v1/parse/asyncreturns202 { jobId }GET /v1/job/:jobIdreturns the job state plusresultorerror
If Redis is missing, or Redis is present but S3 is not fully configured, POST /v1/parse keeps the existing direct behavior and the async/status endpoints return 503.
When queue mode is enabled, Bull Board is also available at /v1/admin/queues.