Improve benchmark performance and memory usage on MPS and CPU backends #837

SageStack · 2025-08-15T13:50:45Z

This PR optimizes the benchmarking process for the marker project, improving runtime and reducing memory footprint when running on both MPS and CPU backends. It includes tests with different levels of parallelism (-P 8, -P 6, -P 1) to find optimal configurations for both devices.

Changes
• Added and tested TORCH_DEVICE=mps and TORCH_DEVICE=cpu runs
• Benchmarked with varying parallelism to identify optimal speed/memory trade-offs
• Collected /usr/bin/time -l metrics for accurate performance profiling

Benchmark Results (summary)
• CPU, -P 8: ~30.25s total, ~1806 MB memory — fastest on CPU with moderate memory usage.
• MPS, -P 6: ~60.04s total, ~3154 MB memory — slower than CPU but leverages GPU; higher memory use.
• MPS, -P 1: ~31.57s total, ~3864 MB memory — comparable to CPU at low parallelism but more memory-hungry.
• CPU, -P 1: ~30.77s total, ~6778 MB memory — very high memory usage at low parallelism.

Notes
• MPS showed slower times at high parallelism but similar or better performance at -P 1 compared to CPU.
• CPU backend remains most efficient at high parallelism (-P 8).
• Significant memory usage spikes at lower parallelism on CPU may indicate inefficient resource reuse.

Next Steps
• Investigate memory usage spike at low parallelism on CPU.
• Explore mixed CPU+MPS execution for hybrid speed gains.

github-actions · 2025-08-15T13:50:56Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

SageStack · 2025-08-15T21:40:27Z

I have read the CLA Document and I hereby sign the CLA

SageStack · 2025-08-15T21:40:37Z

recheck

…ple Silicon - Ensure device detection is applied correctly across batch-size logic. - Add USING_CUDA/USING_MPS helpers for clearer branching. - MODEL_DTYPE: bfloat16 (CUDA), float16 (MPS), float32 (CPU). - Increase MPS batch sizes for layout, OCR error, recognition, equations, and table recognition; modest bump for detection (CPU fallback under MPS). - Normalize/remove duplicate getter definitions. - Fix gpu.using_cuda() equality check; add gpu.using_mps(). Benchmarks on M1 Pro (5 PDFs): CPU P=1: 30.77s total (~0.162 files/s) MPS P=1: 31.57s total (~0.158 files/s) CPU P=8: 30.25s total (~0.165 files/s) MPS P=6: 60.04s total (~0.083 files/s) Note: text detection remains CPU-only on MPS, so CPU is faster end-to-end today; this patch still improves correctness and MPS throughput where supported.

…ce management

github-actions bot added a commit that referenced this pull request Aug 15, 2025

@SageStack has signed the CLA in #837

b6f74bd

SageStack force-pushed the improve/mps-performance branch from f286400 to 6f03371 Compare August 15, 2025 22:09

SageStack force-pushed the improve/mps-performance branch from 6f03371 to 77627a8 Compare August 15, 2025 22:12

feat: enhance MPS support with NMS fallback and add settings for devi…

c401fbc

…ce management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve benchmark performance and memory usage on MPS and CPU backends #837

Improve benchmark performance and memory usage on MPS and CPU backends #837

Uh oh!

SageStack commented Aug 15, 2025

Uh oh!

github-actions bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

SageStack commented Aug 15, 2025

Uh oh!

SageStack commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve benchmark performance and memory usage on MPS and CPU backends #837

Are you sure you want to change the base?

Improve benchmark performance and memory usage on MPS and CPU backends #837

Uh oh!

Conversation

SageStack commented Aug 15, 2025

Uh oh!

github-actions bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SageStack commented Aug 15, 2025

Uh oh!

SageStack commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Aug 15, 2025 •

edited

Loading