Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds lprec memory estimation and a test #348

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

adds lprec memory estimation and a test #348

wants to merge 1 commit into from

Conversation

dkazanc
Copy link
Collaborator

@dkazanc dkazanc commented May 28, 2024

It also fixes the issue with the FBP memoryhook tests failing.

Copy link
Collaborator

@yousefmoazzam yousefmoazzam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that the memory hook tests for methods using FFT's can be tricky to get passing for all GPUs that are likely to be used with httomo.

The GPU in ws448 is a Quadro RTX 5000, and it's a model which is closer to a V100 than a P100 in terms of how new it is (and thus how it may handle FFT's).

So this is just a note that the failing tests for the Quadro RTX 5000 may mean that the tests would also fail on a V100, which could mean slightly inaccurate memory estimation when running httomo on V100's.

Comment on lines +91 to +97
3 * in_slice_size
+ filtered_in_data
+ freq_slice
+ fftplan_size
+ 3.5 * astra_out_size
+ astra_out_size
)
return (tot_memory_bytes, filter_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ws448, on the fourier branch, two FBP memory hook tests now fail for me:

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@ws448 httomo (fourier)]$ python -m pytest tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook
========================================== test session starts ===========================================
platform linux -- Python 3.10.10, pytest-7.1.2, pluggy-1.3.0 -- /dls/science/users/twi18192/conda-envs/httomo/bin/python
cachedir: .pytest_cache
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /dls/science/users/twi18192/httomo, configfile: pyproject.toml
plugins: mpi-0.6, cov-4.1.0, typeguard-3.0.2, benchmark-4.0.0, mock-3.12.0
collected 6 items

tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-3] PASSED                 [ 16%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-5] PASSED                 [ 33%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-8] PASSED                 [ 50%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-3] FAILED                 [ 66%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-5] FAILED                 [ 83%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-8] PASSED                 [100%]

================================================ FAILURES ================================================
___________________________________ test_recon_FBP_memoryhook[2560-3] ____________________________________
Traceback (most recent call last):
  File "/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py", line 449, in test_recon_FBP_memoryhook
    assert estimated_memory_mb >= max_mem_mb
AssertionError: assert 444.47 >= 491.7
___________________________________ test_recon_FBP_memoryhook[2560-5] ____________________________________
Traceback (most recent call last):
  File "/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py", line 449, in test_recon_FBP_memoryhook
    assert estimated_memory_mb >= max_mem_mb
AssertionError: assert 740.78 >= 752.81
======================================== short test summary info =========================================
FAILED tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-3] - assert 444.47 >= 4...
FAILED tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-5] - assert 740.78 >= 7...
====================================== 2 failed, 4 passed in 5.11s =======================================

Whereas on main, all FBP memory hook tests were passing:

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@ws448 httomo (main)]$ python -m pytest tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook
========================================== test session starts ===========================================
platform linux -- Python 3.10.10, pytest-7.1.2, pluggy-1.3.0 -- /dls/science/users/twi18192/conda-envs/httomo/bin/python
cachedir: .pytest_cache
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /dls/science/users/twi18192/httomo, configfile: pyproject.toml
plugins: mpi-0.6, cov-4.1.0, typeguard-3.0.2, benchmark-4.0.0, mock-3.12.0
collected 6 items

tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-3] PASSED                 [ 16%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-5] PASSED                 [ 33%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[1200-8] PASSED                 [ 50%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-3] PASSED                 [ 66%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-5] PASSED                 [ 83%]
tests/test_backends/test_httomolibgpu.py::test_recon_FBP_memoryhook[2560-8] PASSED                 [100%]

=========================================== 6 passed in 4.03s ============================================

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we discussed, this multiplier is speculative as I had to increase it for larger data during benchmarks. Let us start with the memory estimator for it in a separate branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm happy for the FBP memory estimation to be dealt with in a separate PR. If so, shall we just leave the FBP memory estimation alone in this PR and only have LPRec changes here?

Ie, shall we get rid of the FBP memory estimation changes here and put them in the PR for FBP memory estimation, since the changes to FBP memory estimation here are "incomplete" anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sounds good. We should remove the FBP part. Sorry I thought it would be a quick fix so I put it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, this time I remembered to run the tests before reviewing so it was easy enough to catch, compared to the past times where I forgot to run the tests... 😅


@pytest.mark.cupy
@pytest.mark.parametrize("slices", [4, 7])
def test_recon_LPRec_memoryhook(slices, ensure_clean_memory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ws448, one of the two LPRec memory hook tests seems to fail:

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@ws448 httomo (fourier)]$ python -m pytest tests/test_backends/test_httomolibgpu.py::test_recon_LPRec_memoryhook
========================================== test session starts ===========================================
platform linux -- Python 3.10.10, pytest-7.1.2, pluggy-1.3.0 -- /dls/science/users/twi18192/conda-envs/httomo/bin/python
cachedir: .pytest_cache
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /dls/science/users/twi18192/httomo, configfile: pyproject.toml
plugins: mpi-0.6, cov-4.1.0, typeguard-3.0.2, benchmark-4.0.0, mock-3.12.0
collected 2 items

tests/test_backends/test_httomolibgpu.py::test_recon_LPRec_memoryhook[4] FAILED                    [ 50%]
tests/test_backends/test_httomolibgpu.py::test_recon_LPRec_memoryhook[7] PASSED                    [100%]

================================================ FAILURES ================================================
_____________________________________ test_recon_LPRec_memoryhook[4] _____________________________________
Traceback (most recent call last):
  File "/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py", line 488, in test_recon_LPRec_memoryhook
    assert percents_relative_maxmem <= 20
AssertionError: assert 23 <= 20
======================================== short test summary info =========================================
FAILED tests/test_backends/test_httomolibgpu.py::test_recon_LPRec_memoryhook[4] - assert 23 <= 20
====================================== 1 failed, 1 passed in 3.66s =======================================

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, I think the memory estimator is far from ideal so we can delay with that PR for now. I just wanted to show the potential during the benchmarks. Will come back to it once again after dealing with FBP. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants