Skip to content

Support bytes/Path inputs, presigned URLs, and multipart uploads#165

Closed
majcheradam wants to merge 4 commits intozai-org:mainfrom
ocrbase-hq:feat/bytes-url-multipart-upload
Closed

Support bytes/Path inputs, presigned URLs, and multipart uploads#165
majcheradam wants to merge 4 commits intozai-org:mainfrom
ocrbase-hq:feat/bytes-url-multipart-upload

Conversation

@majcheradam
Copy link
Copy Markdown

Summary

  • Extend GlmOcr.parse() to accept bytes, pathlib.Path, and presigned URLs in addition to plain str paths
  • Add HTTP(S) URL download support in PageLoader and image_utils (including presigned S3/GCS URLs)
  • Add multipart/form-data support to the Flask /glmocr/parse endpoint so clients can upload files directly alongside URL strings

Test plan

  • Verify parse() with local file path (str), Path object, raw bytes, and presigned URL
  • Verify Flask endpoint with application/json body (existing behavior)
  • Verify Flask endpoint with multipart/form-data (file uploads + URL fields)
  • Verify streaming mode with mixed input types
  • Verify temp files are cleaned up after processing

🤖 Generated with Claude Code

majcheradam and others added 4 commits March 25, 2026 15:21
…loads

Extend parse() to accept bytes and pathlib.Path in addition to str.
Add HTTP(S) URL support (including presigned URLs) in PageLoader and
image_utils. Add multipart/form-data endpoint to the Flask server so
clients can upload files directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds information extraction support using GLM-OCR's native extraction
mode. Accepts schemas as empty-value templates, JSON Schema (Zod-compatible
via zodToJsonSchema), or Pydantic model classes.

- GlmOcr.extract() method and glmocr.extract() convenience function in api.py
- POST /glmocr/extract endpoint in server.py (JSON + multipart/form-data)
- PDF extraction uses two-phase approach: full parse then single VLM call
- Image extraction sends image + prompt directly to VLM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When no schema is provided, the extract endpoint now parses the document
first to get markdown, then sends it to the VLM to produce structured JSON
automatically instead of returning a 400 error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Let vLLM use its full context window (--max-model-len) instead of
artificially capping output tokens with a heuristic estimator.
Also add gunicorn to server extras.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ocrbase-hq ocrbase-hq closed this by deleting the head repository Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants