Skip to content

GLM-OCR much slower than PP-OCR-VL on T4 (16s vs 2–3s per page), possible misconfiguration? #175

@Tsan1024

Description

@Tsan1024

Hi, I did a simple test with PP-OCR-VL-1.5 and found that GLM-OCR is much slower at parsing PDFs.

For example:

  • PP-OCR-VL: ~2–3s per PDF page
  • GLM-OCR: ~16s per PDF page

This speed gap is significantly larger than the reported performance.

Environment

  • Python: 3.12.13
  • CUDA: 13.0 (Driver 580.126.09)
  • GPU: NVIDIA T4

vLLM launch scripts

PP-OCR-VL

vllm serve PaddlePaddle/PaddleOCR-VL \
    --host 0.0.0.0 \
    --port 8080 \
    --served-model-name PaddleOCR-VL-1.5-0.9B \
    --trust-remote-code \
    --max-num-batched-tokens 16384 \
    --no-enable-prefix-caching \
    --mm-processor-cache-gb 0 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.85 \
    --max-num-seqs 16 \
    --dtype float16

GLM-OCR

vllm serve zai-org/GLM-OCR \
  --host 0.0.0.0 \
  --port 8080 \
  --trust-remote-code \
  --allowed-local-media-path / \
  --max-num-batched-tokens 16384 \
  --no-enable-prefix-caching \
  --mm-processor-cache-gb 0 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.85 \
  --max-num-seqs 16 \
  --dtype float16

Note: I tried disabling/enabling prefix-caching, but it had little effect on GLM-OCR speed.

Could you please check if there are any misconfigurations or suboptimal parameters in my launch command for GLM-OCR?

Are there any recommended settings for T4 GPUs to achieve normal inference speed?

Any advice would be appreciated.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions