Improve benchmark performance and memory usage on MPS and CPU backends #837
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes the benchmarking process for the marker project, improving runtime and reducing memory footprint when running on both MPS and CPU backends. It includes tests with different levels of parallelism (-P 8, -P 6, -P 1) to find optimal configurations for both devices.
Changes
• Added and tested TORCH_DEVICE=mps and TORCH_DEVICE=cpu runs
• Benchmarked with varying parallelism to identify optimal speed/memory trade-offs
• Collected /usr/bin/time -l metrics for accurate performance profiling
Benchmark Results (summary)
• CPU, -P 8: ~30.25s total, ~1806 MB memory — fastest on CPU with moderate memory usage.
• MPS, -P 6: ~60.04s total, ~3154 MB memory — slower than CPU but leverages GPU; higher memory use.
• MPS, -P 1: ~31.57s total, ~3864 MB memory — comparable to CPU at low parallelism but more memory-hungry.
• CPU, -P 1: ~30.77s total, ~6778 MB memory — very high memory usage at low parallelism.
Notes
• MPS showed slower times at high parallelism but similar or better performance at -P 1 compared to CPU.
• CPU backend remains most efficient at high parallelism (-P 8).
• Significant memory usage spikes at lower parallelism on CPU may indicate inefficient resource reuse.
Next Steps
• Investigate memory usage spike at low parallelism on CPU.
• Explore mixed CPU+MPS execution for hybrid speed gains.