Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve accuracy of 360 stitching memory estimation #363

Merged
merged 5 commits into from
Aug 8, 2024

Conversation

yousefmoazzam
Copy link
Collaborator

@yousefmoazzam yousefmoazzam commented Jun 11, 2024

Fixes #354

The main changes to the memory estimation are:

  • the data shape change is accounted for (in particular, the sino width changes)
  • several copies of a subset of the input array are accounted for

Acceptance criteria checklist:

  • Newly introduced memory hook tests for sino_360_to_180 pass
  • The variable names in the memory estimation code + code comments when connecting to the method code are understandable to other devs

@dkazanc
Copy link
Collaborator

dkazanc commented Jun 13, 2024

Thanks, all seem clear me. Pretty memory-hungry one actually.

@yousefmoazzam
Copy link
Collaborator Author

One pretty simple way to reduce the amount of memory used is to make the weights array be float32 rather than float64; doing that would halve the amount of memory used by those 4 copies mentioned in the code comments.

I'll make an issue in the httomolibgpu repo so then we can address that at some point 🙂

@dkazanc
Copy link
Collaborator

dkazanc commented Jun 13, 2024

One pretty simple way to reduce the amount of memory used is to make the weights array be float32 rather than float64; doing that would halve the amount of memory used by those 4 copies mentioned in the code comments.

I'll make an issue in the httomolibgpu repo so then we can address that at some point 🙂

Thanks, there is no reason to keep it 64bit.

@dkazanc
Copy link
Collaborator

dkazanc commented Jul 9, 2024

FYI, I'm fixing the data type thing on the httomolibgpu side of things + adding a test. Wasn't there apparently. Then we can merge

@yousefmoazzam
Copy link
Collaborator Author

Cool, thanks! When that's done I'll change stuff here to reflect the lower memory usage, and will check the memory hook test is still passing.

@dkazanc
Copy link
Collaborator

dkazanc commented Jul 9, 2024

@yousefmoazzam memory hook test now pass (couldn't run 2160-80 as OOM on this machine), feel free to add/edit anything and merge with this one?

@yousefmoazzam
Copy link
Collaborator Author

Ok that's fine, the 2160-80 passes on a wilson node, which is good.

I'll also run the 360 pipeline involving a recon method (gridrec I think?) which originally brought us to this problem, to verify it all works with that small change in the context of a proper run, and then will merge both PR's if all looks good.

@yousefmoazzam
Copy link
Collaborator Author

Annoyingly, I'm getting a segfault in the reslice for some reason when running the pipeline bench_recon_gridrec_cpu360.yaml with 4 MPI processes, so I can't verify if it works in the context of a 4 process run.

With 2 MPI processes things are all fine (no segfault in reslice, and the sino_360_to_180 max slices estimaiton must be working ok since there's no CUDA OOM error).

I feel like that's sufficient evidence that this is working fine (though further investigation into why segfault in reslice is happening once more for 360 data should be done...), what do you think @dkazanc?

@dkazanc
Copy link
Collaborator

dkazanc commented Jul 10, 2024

yes, certainly. We need to know why it is happening. Does it also happen for the GPU pipeline with 360 to 180 conversion or just the CPU one?

@dkazanc dkazanc changed the base branch from main to dev July 11, 2024 10:50
Base automatically changed from dev to main August 8, 2024 14:13
@dkazanc
Copy link
Collaborator

dkazanc commented Aug 8, 2024

Adding this to be aware of:
When using gridrec the segfault or a "stall" can happen because gridrec relies on FFTs and it is RAM-hungry method. The GPU function sino_360_to_180 will define the number of slices to process and pass them to gridrec. This can result in a memory overflow on P100 nodes. It does work on the nodes where more memory available.
It is preferable to use FBP GPU method where possible.

@dkazanc dkazanc merged commit 25945bb into main Aug 8, 2024
2 of 3 checks passed
@dkazanc dkazanc deleted the 360-stitching-memory-estimation branch August 8, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overestimation of max slices for sino_360_to_180
2 participants