-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large overestimation of GPU memory causing failing memory hook tests on some DLS workstations #184
Comments
Note that on other DLS machines, it appears that other memory hook tests can fail (though there's always the possibility that the CUDA OOM errors are influencing the results of other tests?). For example, on
despite those non-CUDA OOM error memory hook tests passing on |
Here are some concrete numbers for why Case
Case
Notice that there is somewhat of a "jump" in the peak GPU memory allocated of 25MB between 600 and 601, but the estimation going from 600 to 601 is basically the same (it's increased by a very small amount going from 600 to 601, which is expected). From this it appears that maybe as the value of |
I guess this not an issue now, but we've got instead |
On the DLS machine
ws448
, using:gpuloop
branch on httomolibrary
branch on httomolibgpuand running the memory hook tests in
test_httomolibgpu.py
shows that one of the cases in the FBP memory hook parametrised test (namely, the caserecon_size_it=600
)httomo/tests/test_backends/test_httomolibgpu.py
Lines 270 to 271 in e44782e
A threshold is defined on the size of the difference between
when compared to the peak GPU memory usage. In terms of a simple formula:
This is happening in the following part of the test:
httomo/tests/test_backends/test_httomolibgpu.py
Lines 293 to 299 in e44782e
The output of the failing test is the following:
Note that manually changing the
600
case to601
in the parametrised test produces a passing test onws448
, which potentially indicates that, on some machines and in some cases, the threshold of 35% is fine, and in other cases the threshold of 35% is not fine.The text was updated successfully, but these errors were encountered: