Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 30, 2025

This PR adds pytest support for the 01_store/store_bench.py benchmark by following the established pattern from test_load_bench.py.

Changes Made

Added bench_store() function

  • Extracted the core benchmarking logic from run_experiment() into a reusable bench_store() function
  • Maintains the same signature pattern as bench_load() for consistency:
    def bench_store(shmem, source_rank, destination_rank, buffer, BLOCK_SIZE, dtype, 
                    verbose=False, validate=False, num_experiments=1, num_warmup=0)

Refactored run_experiment()

  • Updated to use the new bench_store() function internally
  • Maintains full backward compatibility with existing CLI usage
  • Clean separation between argument parsing and benchmarking logic

Created test_store_bench.py

  • Follows the exact pattern established by test_load_bench.py
  • Parametrized tests covering different data types (int8, float16, bfloat16, float32)
  • Tests various buffer sizes and block sizes (512, 1024)
  • Properly imports the benchmark module and calls bench_store()

Fixed barrier synchronization issue

  • Removed explicit warmup call and barrier that was causing deadlocks in test environment
  • The iris.do_bench function handles warmup and barriers internally
  • Now matches the synchronization pattern used in bench_load function

Testing Structure

The test follows the established pattern:

@pytest.mark.parametrize("dtype", [torch.int8, torch.float16, torch.bfloat16, torch.float32])
@pytest.mark.parametrize("buffer_size, heap_size", [((1 << 32), (1 << 33))])
@pytest.mark.parametrize("block_size", [512, 1024])
def test_store_bench(dtype, buffer_size, heap_size, block_size):
    # Test implementation

This implementation provides a clean, testable interface while making minimal changes to the existing codebase and maintaining full backward compatibility.

Fixes #56.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@Copilot Copilot AI changed the title [WIP] Implement pytest for 01_store/store_bench.py Implement pytest for 01_store/store_bench.py Aug 30, 2025
Copilot finished work on behalf of mawad-amd August 30, 2025 22:07
@Copilot Copilot AI requested a review from mawad-amd August 30, 2025 22:07
@mawad-amd
Copy link
Collaborator

@copilot

tests/examples/test_store_bench.py::test_store_bench[512-4294967296-8589934592-dtype0] Fatal Python error: Aborted

Thread 0x00007ffb8433b640 (most recent call first):
  File "/opt/conda/envs/py_3.10/lib/python3.10/socket.py", line 293 in accept
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest_rerunfailures.py", line 433 in run_server
  File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 953 in run
  File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007ffb865bf740 (most recent call first):
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 896 in synchronize
  File "/home/runner/_work/iris/iris/iris/iris.py", line 351 in barrier
  File "/home/runner/_work/iris/iris/examples/01_store/store_bench.py", line 130 in bench_store
  File "/home/runner/_work/iris/iris/tests/examples/test_store_bench.py", line 57 in test_store_bench
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/python.py", line 1799 in runtest
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 166 in main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 189 in console_main
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest/__main__.py", line 5 in <module>
  File "/opt/conda/envs/py_3.10/lib/python3.10/runpy.py", line 86 in _run_code
  File "/opt/conda/envs/py_3.10/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg.lapack_lite, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, mpi4py.MPI, hip_utils, __triton_launcher (total: 24)
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 802853 on node iris-githubrunner-mi3008x-5866b8ddb9-c5fh8 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------
Error: Process completed with exit code 134.

Copy link
Contributor Author

Copilot AI commented Aug 30, 2025

@copilot

tests/examples/test_store_bench.py::test_store_bench[512-4294967296-8589934592-dtype0] Fatal Python error: Aborted

...

Fixed the barrier deadlock issue. The problem was an explicit warmup call and barrier before iris.do_bench that was causing a deadlock. The iris.do_bench function handles warmup and barriers internally, so the explicit calls were redundant and problematic. Removed the explicit warmup to match the pattern used in bench_load. (e753e36)

Copilot finished work on behalf of mawad-amd August 30, 2025 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pytest for 01_store/store_bench.py
2 participants