Hi, I did a simple test with PP-OCR-VL-1.5 and found that GLM-OCR is much slower at parsing PDFs.
For example:
- PP-OCR-VL: ~2–3s per PDF page
- GLM-OCR: ~16s per PDF page
This speed gap is significantly larger than the reported performance.
Environment
- Python: 3.12.13
- CUDA: 13.0 (Driver 580.126.09)
- GPU: NVIDIA T4
vLLM launch scripts
PP-OCR-VL
vllm serve PaddlePaddle/PaddleOCR-VL \
--host 0.0.0.0 \
--port 8080 \
--served-model-name PaddleOCR-VL-1.5-0.9B \
--trust-remote-code \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--max-num-seqs 16 \
--dtype float16
GLM-OCR
vllm serve zai-org/GLM-OCR \
--host 0.0.0.0 \
--port 8080 \
--trust-remote-code \
--allowed-local-media-path / \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--max-num-seqs 16 \
--dtype float16
Note: I tried disabling/enabling prefix-caching, but it had little effect on GLM-OCR speed.
Could you please check if there are any misconfigurations or suboptimal parameters in my launch command for GLM-OCR?
Are there any recommended settings for T4 GPUs to achieve normal inference speed?
Any advice would be appreciated.
Hi, I did a simple test with PP-OCR-VL-1.5 and found that GLM-OCR is much slower at parsing PDFs.
For example:
This speed gap is significantly larger than the reported performance.
Environment
vLLM launch scripts
PP-OCR-VL
GLM-OCR
Note: I tried disabling/enabling
prefix-caching, but it had little effect on GLM-OCR speed.Could you please check if there are any misconfigurations or suboptimal parameters in my launch command for GLM-OCR?
Are there any recommended settings for T4 GPUs to achieve normal inference speed?
Any advice would be appreciated.