Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
000d8af
Remove support for OCR without layout analysis
Ataraxy33 Feb 28, 2026
796519a
reconstruct pipeline
Ataraxy33 Feb 28, 2026
0489c36
- Add async pipeline support for files in directory via CLI
Ataraxy33 Mar 2, 2026
6dc8d8c
Fix a blocking bug; remove redundant PIL format conversions; optimize…
Ataraxy33 Mar 3, 2026
59e6ba6
Added shutdown event handling to safely stop processing and drain queues
Ataraxy33 Mar 3, 2026
56c40a8
support image / PDF bytes input
Ataraxy33 Mar 3, 2026
f55d36a
add image path to json result for cropped images
Ataraxy33 Mar 4, 2026
73ef92d
Fix a layout visualization file naming bug
Ataraxy33 Mar 4, 2026
eb2ebe3
add fallback handling to page loader
Ataraxy33 Mar 4, 2026
c7104f3
change PDF renderer to PyMuPDF; harden error handling
Ataraxy33 Mar 5, 2026
f6b67c4
simplify image region save flow to reduce IO
Ataraxy33 Mar 5, 2026
eebf13a
save raw output json file from recognition model
Ataraxy33 Mar 6, 2026
5dd2778
add an argument to control whether to use polygon property in layout …
Ataraxy33 Mar 6, 2026
ffa8f53
support load image/PDF files from a directory recursively & update de…
Ataraxy33 Mar 10, 2026
1d7a43a
Implement safe extraction of polygon points in layout detector to han…
Ataraxy33 Mar 10, 2026
9ce61f7
Removed temporary directory usage for layout visualizations and updat…
Ataraxy33 Mar 10, 2026
961abcd
Enhance configuration flexibility by adding CLI `--set` option for ov…
Ataraxy33 Mar 11, 2026
79404cd
Update default output directory from './results' to './output' in sav…
Ataraxy33 Mar 11, 2026
da003dd
Update methods to load images and PDFs from various input types, remo…
Ataraxy33 Mar 11, 2026
83e4c71
Improve recognition result postprocess
Ataraxy33 Mar 12, 2026
28f108b
Add multi-GPU deployment support for GLM-OCR
Ataraxy33 Mar 13, 2026
1239089
Refactor multi-GPU deployment to eliminate tempfile usage and direct …
Ataraxy33 Mar 18, 2026
f19f404
Update error handling in recognition process to log failures and set …
Ataraxy33 Mar 18, 2026
88fc499
Add health monitoring to OCR pipeline with a watchdog thread and sock…
Ataraxy33 Mar 18, 2026
0d51f41
Add engine health checks in multi-GPU coordinator to monitor and hand…
Ataraxy33 Mar 18, 2026
978564a
Refactor content validation in ResultFormatter to skip non-image labe…
Ataraxy33 Mar 19, 2026
507c2d1
Add inline formula normalization in result formatting process
Ataraxy33 Mar 20, 2026
8f134ad
Refactor result yielding in Pipeline to maintain original input order
Ataraxy33 Mar 20, 2026
b3874ad
Add --no-save option to multi-GPU deployment for optional result file…
Ataraxy33 Mar 20, 2026
cee8fa0
Update multi-GPU deployment to pass log directory and engine log leve…
Ataraxy33 Mar 20, 2026
33b78f7
Add engine log level parameter to build_engine_cmd for enhanced loggi…
Ataraxy33 Mar 20, 2026
9b83925
Enhance memory management in PipelineState by adding release_unit_dat…
Ataraxy33 Mar 24, 2026
2ec0809
Add post-processing configuration options to ResultFormatter for merg…
Ataraxy33 Mar 26, 2026
4252a3c
Update default configuration parameters
Ataraxy33 Mar 26, 2026
cf32315
Add preserve_order parameter to GlmOcr and Pipeline classes for consi…
Ataraxy33 Mar 26, 2026
52687c2
Refactor code for improved readability and consistency across multipl…
Ataraxy33 Mar 26, 2026
fdb10fc
Merge remote-tracking branch 'upstream/main' into reconstruct-pipeline
Ataraxy33 Mar 26, 2026
b9cc4cb
Passed the pre-commit code check
Ataraxy33 Mar 27, 2026
77a7430
Add preserve_order argument to stream parsing test
Ataraxy33 Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,10 @@ glmocr parse examples/source/code.png --layout-device cpu

# Run layout detection on a specific GPU
glmocr parse examples/source/code.png --layout-device cuda:1

# Override any config value via --set (dotted path, repeatable)
glmocr parse examples/source/code.png --set pipeline.ocr_api.api_port 8080
glmocr parse examples/source/ --set pipeline.layout.use_polygon true --set logging.level DEBUG
```

#### Python API
Expand Down Expand Up @@ -256,6 +260,14 @@ Semantics:

### Configuration

Configuration priority (highest to lowest):

1. CLI `--set` overrides
2. Python API keyword arguments
3. `GLMOCR_*` environment variables / `.env` file
4. YAML config file
5. Built-in defaults

Full configuration in `glmocr/config.yaml`:

```yaml
Expand All @@ -276,13 +288,13 @@ pipeline:
api_host: localhost
api_port: 8080
api_key: null # or set API_KEY env var
connect_timeout: 300
request_timeout: 300
connect_timeout: 30
request_timeout: 120

# Page loader settings
page_loader:
max_tokens: 16384
temperature: 0.01
max_tokens: 8192
temperature: 0.0
image_format: JPEG
min_pixels: 12544
max_pixels: 71372800
Expand All @@ -291,9 +303,6 @@ pipeline:
result_formatter:
output_format: both # json, markdown, or both

# Layout detection (optional)
enable_layout: false

# Layout model device placement
layout:
# device: null # null=auto, "cpu", "cuda", or "cuda:N"
Expand Down
23 changes: 16 additions & 7 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,10 @@ glmocr parse examples/source/code.png --config my_config.yaml

# 开启 debug 日志(包含 profiling)
glmocr parse examples/source/code.png --log-level DEBUG

# 通过 --set 覆盖任意配置项(使用 dotted path,可多次使用)
glmocr parse examples/source/code.png --set pipeline.ocr_api.api_port 8080
glmocr parse examples/source/ --set pipeline.layout.use_polygon true --set logging.level DEBUG
```

#### Python API
Expand Down Expand Up @@ -241,6 +245,14 @@ curl -X POST http://localhost:5002/glmocr/parse \

### 配置

配置加载优先级(从高到低):

1. CLI `--set` 参数
2. Python API 关键字参数
3. `GLMOCR_*` 环境变量 / `.env` 文件
4. YAML 配置文件
5. 内置默认值

完整配置见 `glmocr/config.yaml`:

```yaml
Expand All @@ -261,23 +273,20 @@ pipeline:
api_host: localhost
api_port: 8080
api_key: null # or set API_KEY env var
connect_timeout: 300
request_timeout: 300
connect_timeout: 30
request_timeout: 120

# Page loader settings
page_loader:
max_tokens: 16384
temperature: 0.01
max_tokens: 8192
temperature: 0.0
image_format: JPEG
min_pixels: 12544
max_pixels: 71372800

# Result formatting
result_formatter:
output_format: both # json, markdown, or both

# Layout detection (optional)
enable_layout: false
```

更多选项请参考 [config.yaml](glmocr/config.yaml)。
Expand Down
2 changes: 0 additions & 2 deletions agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ or in a `.env` file anywhere in the working-directory ancestry.
| `GLMOCR_OCR_API_HOST` | `pipeline.ocr_api.api_host` | `localhost` |
| `GLMOCR_OCR_API_PORT` | `pipeline.ocr_api.api_port` | `5002` |
| `GLMOCR_OCR_MODEL` | `pipeline.ocr_api.model` | `glm-ocr-model` |
| `GLMOCR_ENABLE_LAYOUT` | `pipeline.enable_layout` | `true` / `false` |
| `GLMOCR_LOG_LEVEL` | `logging.level` | `DEBUG`, `INFO`, `WARNING`, `ERROR` |

### `.env` File Auto-Loading
Expand Down Expand Up @@ -102,7 +101,6 @@ with **higher priority**.
| `model` | `str` | Model name. |
| `mode` | `str` | `"maas"` or `"selfhosted"`. |
| `timeout` | `int` | Request timeout in seconds. |
| `enable_layout` | `bool` | Enable layout detection. |
| `log_level` | `str` | Logging level. |

---
Expand Down
120 changes: 120 additions & 0 deletions examples/multi-gpu-deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Multi-GPU Deployment for GLM-OCR

Automatically launch sglang/vLLM inference services across multiple GPUs, distribute image files evenly, and run the GLM-OCR pipeline in parallel for maximum throughput.

Each GPU hosts both an inference server (sglang or vLLM) and a layout detection model, forming a self-contained processing unit with zero cross-GPU communication.

## Features

- **Auto GPU detection** — discovers all available GPUs and filters by free VRAM
- **Dynamic port allocation** — automatically skips occupied ports
- **Fault tolerance** — failed GPUs are skipped, files are redistributed to healthy GPUs
- **Global progress bar** — real-time `tqdm` progress across all GPUs
- **Graceful shutdown** — `Ctrl+C` cleanly terminates all subprocesses; double `Ctrl+C` force-kills
- **Centralized logging** — all engine/worker logs saved under `logs/<timestamp>/`
- **Speculative decoding** — MTP enabled by default for both sglang and vLLM

## Quick Start

```bash
# Use all available GPUs with sglang (default)
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output -m /path/to/GLM-OCR

# Specify GPUs and use vLLM
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output --engine vllm --gpus 0,1,2,3

# Custom model path and VRAM threshold
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output -m /path/to/GLM-OCR --min-free-mb 20000
```

## Parameters

| Parameter | Default | Description |
|---|---|---|
| `-i`, `--input` | *required* | Input image file or directory (recursive) |
| `-o`, `--output` | `./output` | Output directory for results |
| `-m`, `--model` | `zai-org/GLM-OCR` | Model name or local path |
| `--engine` | `sglang` | Inference engine: `sglang` or `vllm` |
| `--gpus` | `auto` | GPU IDs (comma-separated) or `auto` for all available |
| `--base-port` | `8080` | Base port for engine services |
| `--min-free-mb` | `16000` | Minimum free GPU memory in MB to use a GPU |
| `--timeout` | `600` | Engine startup timeout in seconds |
| `--engine-args` | *none* | Extra arguments passed to the engine |
| `-c`, `--config` | *none* | Path to a custom glmocr config YAML |
| `--log-level` | `WARNING` | Log level for worker processes |


## Examples

### Basic usage

```bash
python examples/multi-gpu-deploy/launch.py -i /data/documents -o /data/results
```

### Use vLLM with specific GPUs

```bash
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--engine vllm \
--gpus 0,2,4,6
```

### Custom engine arguments

```bash
# sglang with custom memory fraction
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--engine-args "--mem-fraction-static 0.85"
```

### Custom config YAML

```bash
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--config my_config.yaml
```

## Logs

All logs are saved under `logs/<timestamp>/`:

| File | Content |
|---|---|
| `main.log` | Coordinator stdout/stderr |
| `engine_gpu<N>_port<P>.log` | Engine service output for each GPU |
| `worker_gpu<N>.log` | Worker process output for each GPU |
| `failed_files.json` | Aggregated list of failed files (if any) |

## Troubleshooting

**Q: Some ports are occupied, will it still work?**

Yes. The launcher automatically scans for available ports starting from `--base-port` and skips any that are in use.

**Q: A GPU runs out of memory mid-processing. What happens?**

The worker on that GPU will fail, but other GPUs continue processing. Failed files are logged in `failed_files.json` for later re-processing.

**Q: How do I re-run only the failed files?**

Copy the failed files to a directory and run the launcher again pointing to that directory.

## File Structure

```
examples/multi-gpu-deploy/
├── launch.py # Entry point and CLI argument parser
├── coordinator.py # Orchestration: GPU detection, engine/worker lifecycle
├── engine.py # Engine service management and progress tracking
├── worker.py # Worker process: GLM-OCR pipeline execution
├── gpu_utils.py # GPU detection, port checking, file sharding
├── README.md # This file (English)
└── README_zh.md # Chinese documentation
```
120 changes: 120 additions & 0 deletions examples/multi-gpu-deploy/README_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# GLM-OCR 多卡并行部署

自动在多张 GPU 上启动 sglang/vLLM 推理服务,均匀分配图像文件,并行运行 GLM-OCR 流水线以获得最大吞吐量。

每张 GPU 同时承载推理服务(sglang 或 vLLM)和版面检测模型,形成独立的处理单元,GPU 之间零通信开销。

## 特性

- **自动检测 GPU** — 自动发现所有可用 GPU,按空闲显存过滤
- **动态端口分配** — 自动跳过已被占用的端口
- **容错机制** — 失败的 GPU 自动跳过,文件重新分配到健康的 GPU 上
- **全局进度条** — 实时 `tqdm` 进度展示,汇总所有 GPU 的处理进度
- **优雅退出** — `Ctrl+C` 清理所有子进程;双击 `Ctrl+C` 强制终止
- **集中日志** — 所有引擎/Worker 日志保存在 `logs/<时间戳>/` 目录下
- **投机解码** — sglang 和 vLLM 均默认启用 MTP(多 Token 预测)

## 快速开始

```bash
# 使用所有可用 GPU,默认 sglang 引擎
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output -m /path/to/GLM-OCR

# 指定 GPU 并使用 vLLM
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output --engine vllm --gpus 0,1,2,3

# 自定义模型路径和显存阈值
python examples/multi-gpu-deploy/launch.py -i ./images -o ./output -m /path/to/GLM-OCR --min-free-mb 20000
```

## 参数说明

| 参数 | 默认值 | 说明 |
|---|---|---|
| `-i`, `--input` | *必填* | 输入图像文件或目录(支持递归扫描) |
| `-o`, `--output` | `./output` | 输出结果目录 |
| `-m`, `--model` | `zai-org/GLM-OCR` | 模型名称或本地路径 |
| `--engine` | `sglang` | 推理引擎:`sglang` 或 `vllm` |
| `--gpus` | `auto` | GPU 编号(逗号分隔)或 `auto` 自动检测 |
| `--base-port` | `8080` | 推理服务起始端口 |
| `--min-free-mb` | `16000` | 使用 GPU 所需的最小空闲显存(MB) |
| `--timeout` | `600` | 推理服务启动超时时间(秒) |
| `--engine-args` | *无* | 传递给推理引擎的额外参数 |
| `-c`, `--config` | *无* | 自定义 glmocr 配置 YAML 路径 |
| `--log-level` | `WARNING` | Worker 进程的日志级别 |


## 使用示例

### 基本用法

```bash
python examples/multi-gpu-deploy/launch.py -i /data/documents -o /data/results
```

### 使用 vLLM 并指定 GPU

```bash
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--engine vllm \
--gpus 0,2,4,6
```

### 自定义引擎参数

```bash
# sglang 设置显存占用比例
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--engine-args "--mem-fraction-static 0.85"
```

### 使用自定义配置文件

```bash
python examples/multi-gpu-deploy/launch.py \
-i /data/documents \
-o /data/results \
--config my_config.yaml
```

## 日志

所有日志保存在 `logs/<时间戳>/` 目录下:

| 文件 | 内容 |
|---|---|
| `main.log` | 协调器主进程的 stdout/stderr |
| `engine_gpu<N>_port<P>.log` | 各 GPU 的推理引擎输出 |
| `worker_gpu<N>.log` | 各 GPU 的 Worker 进程输出 |
| `failed_files.json` | 汇总的失败文件列表(如有) |

## 常见问题

**Q:某些端口被占用了,还能正常工作吗?**

可以。启动器会从 `--base-port` 开始自动扫描可用端口,跳过所有已被占用的端口。

**Q:某张 GPU 在处理过程中显存不足怎么办?**

该 GPU 上的 Worker 会失败,但其他 GPU 继续处理。失败的文件会记录在 `failed_files.json` 中,方便后续重新处理。

**Q:如何只重跑失败的文件?**

将失败的文件复制到一个目录中,然后重新运行启动器指向该目录即可。

## 文件结构

```
examples/multi-gpu-deploy/
├── launch.py # 入口文件与命令行参数解析
├── coordinator.py # 编排器:GPU 检测、引擎/Worker 生命周期管理
├── engine.py # 推理引擎管理与进度追踪
├── worker.py # Worker 进程:GLM-OCR 流水线执行
├── gpu_utils.py # GPU 检测、端口检查、文件分片
├── README.md # 英文文档
└── README_zh.md # 本文件(中文文档)
```
Loading
Loading