Support bytes/Path inputs, presigned URLs, and multipart uploads by majcheradam · Pull Request #165 · zai-org/GLM-OCR

majcheradam · 2026-03-25T14:22:08Z

Summary

Extend GlmOcr.parse() to accept bytes, pathlib.Path, and presigned URLs in addition to plain str paths
Add HTTP(S) URL download support in PageLoader and image_utils (including presigned S3/GCS URLs)
Add multipart/form-data support to the Flask /glmocr/parse endpoint so clients can upload files directly alongside URL strings

Test plan

Verify parse() with local file path (str), Path object, raw bytes, and presigned URL
Verify Flask endpoint with application/json body (existing behavior)
Verify Flask endpoint with multipart/form-data (file uploads + URL fields)
Verify streaming mode with mixed input types
Verify temp files are cleaned up after processing

🤖 Generated with Claude Code

…loads Extend parse() to accept bytes and pathlib.Path in addition to str. Add HTTP(S) URL support (including presigned URLs) in PageLoader and image_utils. Add multipart/form-data endpoint to the Flask server so clients can upload files directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds information extraction support using GLM-OCR's native extraction mode. Accepts schemas as empty-value templates, JSON Schema (Zod-compatible via zodToJsonSchema), or Pydantic model classes. - GlmOcr.extract() method and glmocr.extract() convenience function in api.py - POST /glmocr/extract endpoint in server.py (JSON + multipart/form-data) - PDF extraction uses two-phase approach: full parse then single VLM call - Image extraction sends image + prompt directly to VLM Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When no schema is provided, the extract endpoint now parses the document first to get markdown, then sends it to the VLM to produce structured JSON automatically instead of returning a 400 error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Let vLLM use its full context window (--max-model-len) instead of artificially capping output tokens with a heuristic estimator. Also add gunicorn to server extras. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

majcheradam and others added 4 commits March 25, 2026 15:21

feat: remove max_tokens limit from extract endpoints

4a82a4b

Let vLLM use its full context window (--max-model-len) instead of artificially capping output tokens with a heuristic estimator. Also add gunicorn to server extras. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ocrbase-hq closed this by deleting the head repository Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bytes/Path inputs, presigned URLs, and multipart uploads#165

Support bytes/Path inputs, presigned URLs, and multipart uploads#165
majcheradam wants to merge 4 commits intozai-org:mainfrom
ocrbase-hq:feat/bytes-url-multipart-upload

majcheradam commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

majcheradam commented Mar 25, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants