Status: Initial placeholder for OCR endpoints and integration. This page will expand with full request/response schemas and examples.
The OCR module integrates with media ingestion to extract text from scanned PDFs or images when native text is unavailable. The public API currently exposes lightweight endpoints for backend discovery and preloading, while full OCR execution is driven via the media ingestion APIs.
- Single-user:
X-API-KEY: <key> - Multi-user:
Authorization: Bearer <JWT> - Standard limits apply; OCR preloading is low-cost, while end-to-end OCR via media ingestion follows media service limits.
-
GET
/api/v1/ocr/backends- Lists available OCR backends with basic health info.
- Returns a map keyed by backend name (e.g.,
mineru,points,dots,llamacpp,chatllm) including lightweight backend-specific configuration details. - Field shape varies by backend. Today
llamacppexposesmode,configured_mode,configured,supports_structured_output,supports_json,model,configured_flags, auto-eligibility flags,backend_concurrency_cap, and mode-specific flags such asurl_configured,managed_configured,managed_running,allow_managed_start, andcli_configured;chatllmexposes a similar capability set with its own mode-specific details, but not every field is identical. - Code:
tldw_Server_API/app/api/v1/endpoints/ocr.py:router.get("/backends")
-
POST
/api/v1/ocr/points/preload- Attempts to preload the POINTS Transformers model to surface errors early.
- Returns
{ "status": "ok" | "error", ... }. - Code:
tldw_Server_API/app/api/v1/endpoints/ocr.py:router.post("/points/preload")
OCR is typically enabled via the media ingestion request options. Key fields (see code for authoritative definitions):
enable_ocr(bool) - enable OCR for scanned/low-text PDFsocr_backend(str | null) - backend name (e.g.,tesseract,auto, or module-specific)ocr_lang(str) - language (e.g.,eng)ocr_dpi(int) - DPI for page rendering prior to OCRocr_mode(enum) -alwaysorfallbackocr_min_page_text_chars(int) - threshold to treat a page as “no text” for fallback OCRocr_output_format(str | null) -text|markdown|json(controls structured OCR output)ocr_prompt_preset(str | null) -general|doc|table|spotting|json(backend-specific presets)- The PDF pipeline stores structured OCR data under
analysis_details.ocr.structuredwhen the backend returns it. - Per-page OCR concurrency is capped by the smaller of
OCR_PAGE_CONCURRENCYand the backend profile'smax_page_concurrency.
Reference (code): tldw_Server_API/app/api/v1/schemas/media_request_models.py.
ocr_backend=mineruis supported only for PDF ingestion and OCR evaluation in v1.- MinerU is document-level, not per-page image OCR. The PDF pipeline runs it once for the whole PDF and stores the normalized result under
analysis_details.ocr.structured. - MinerU appears in
GET /api/v1/ocr/backendswith capability flags such aspdf_only,document_level, andopt_in_only. - MinerU is excluded from
auto,auto_high_quality, andOCR.backend_priorityin v1. ocr_langandocr_dpiare advisory for MinerU and are currently recorded in metadata but not used to drive the CLI invocation.
ocr_backend=llamacppandocr_backend=chatllmare server-owned OCR profiles that can run inremote,managed,cli, orautomode.- Both backends use explicit auto-eligibility flags. They only participate in
auto/auto_high_qualitywhen the flag is enabled and the backend is locally available. - Managed mode is single-process only in v1. For multi-worker deployments, use
remoteorcli. - The PDF pipeline records the effective page concurrency it actually used, not just the global cap.
List OCR backends
curl -s http://localhost:8000/api/v1/ocr/backends | jqPreload POINTS Transformers
curl -s -X POST http://localhost:8000/api/v1/ocr/points/preload | jqEnable OCR in media ingestion (illustrative JSON fragment)
{
"enable_ocr": true,
"ocr_backend": "auto",
"ocr_lang": "eng",
"ocr_mode": "fallback",
"ocr_dpi": 300
}Structured OCR example (process PDF + inspect analysis_details)
curl -s -X POST http://localhost:8000/api/v1/media/process-pdfs \
-H "X-API-KEY: $TLDW_API_KEY" \
-F "enable_ocr=true" \
-F "ocr_backend=hunyuan" \
-F "ocr_output_format=json" \
-F "ocr_prompt_preset=json" \
-F "files=@/path/to/sample.pdf"Example response excerpt (truncated)
{
"results": [
{
"analysis_details": {
"ocr": {
"backend": "hunyuan",
"output_format": "json",
"prompt_preset": "json",
"structured": {
"format": "json",
"text": "...",
"pages": [
{ "text": "...", "raw": { "blocks": [ { "text": "..." } ] } }
]
}
}
}
}
]
}MinerU PDF OCR example
curl -s -X POST http://localhost:8000/api/v1/media/process-pdfs \
-H "X-API-KEY: $TLDW_API_KEY" \
-F "enable_ocr=true" \
-F "ocr_backend=mineru" \
-F "ocr_mode=fallback" \
-F "ocr_output_format=markdown" \
-F "files=@/path/to/scanned-table.pdf"Example MinerU discovery response excerpt (truncated)
{
"mineru": {
"available": true,
"pdf_only": true,
"document_level": true,
"opt_in_only": true,
"mode": "cli",
"timeout_sec": 120,
"max_concurrency": 1
}
}- MinerU: document-level PDF OCR with bounded structured artifacts (
pages,tables, artifact excerpts) - POINTS Reader: documentation coming soon
- Llama.cpp OCR: see
Docs/OCR/LlamaCpp-OCR.md - ChatLLM OCR: see
Docs/OCR/ChatLLM-OCR.md - OCR Providers overview: see
Docs/OCR/OCR_Providers.md
- Expand docs with full request/response schemas
- Add examples for common ingestion flows with OCR
- Add troubleshooting and performance tips
If you need additional OCR endpoints or deeper docs, please open an issue with your use case.