Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overestimation of max slices for sino_360_to_180 #354

Closed
yousefmoazzam opened this issue May 31, 2024 · 3 comments · Fixed by #363
Closed

Overestimation of max slices for sino_360_to_180 #354

yousefmoazzam opened this issue May 31, 2024 · 3 comments · Fixed by #363
Assignees
Labels
bug Something isn't working

Comments

@yousefmoazzam
Copy link
Collaborator

Setup:

  • hopper node
  • 1 GPU (Nvidia A100)
  • 1 MPI process
  • 360 data is /mnt/gpfs03/scratch/data/imaging/tomography/tmp/testdata/360/112482.nxs

The following pipeline:

- method: standard_tomo
  module_path: httomo.data.hdf.loaders
  parameters:
    name: tomo
    data_path: entry1/tomo_entry/data/data
    image_key_path: entry1/tomo_entry/instrument/detector/image_key
    rotation_angles:
      data_path: /entry1/tomo_entry/data/rotation_angle
- method: find_center_360
  module_path: httomolibgpu.recon.rotation
  parameters:
    ind: mid
    win_width: 10
    side: null
    denoise: true
    norm: false
    use_overlap: false
  id: centering
  side_outputs:
    cor: centre_of_rotation
    overlap: overlap
    side: side
    overlap_position: overlap_position
- method: normalize
  module_path: httomolibgpu.prep.normalize
  parameters:
    cutoff: 10.0
    minus_log: true
    nonnegativity: false
    remove_nans: false
- method: sino_360_to_180
  module_path: httomolibgpu.misc.morph
  parameters:
    overlap: ${{centering.side_outputs.overlap}}
    rotation: right  

produces a CUDA OOM error on the first block being processed by the sino_360_to_180 method:

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@cs05r-sc-hop01-02 httomo (fourier)]$ mpirun -n 1 python -m httomo run /mnt/gpfs03/scratch/data/imaging/tomography/tmp/testdata/360/112482.nxs /dls/science
/users/twi18192/dls_pipelines/pipelines/bench_recons/bench_recon_gridrec_cpu360.yaml /mnt/gpfs03/scratch/d
ata/imaging/tomography/twi18192/out/
2024-05-31 12:07:42.684 | DEBUG    | httomo.utils:<module>:17 - CuPy is installed
Pipeline has been separated into 2 sections
See the full log file at: /mnt/gpfs03/scratch/data/imaging/tomography/twi18192/out/31-05-2024_12_07_43_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 27448.20ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_360 (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/4 [00:07<?, ?block/s]
    25%|##5       | 1/4 [00:32<01:27, 29.10s/block]
    50%|#####     | 2/4 [00:52<00:47, 23.90s/block]
    75%|#######5  | 3/4 [01:07<00:20, 20.93s/block]
    --->The center of rotation is (1583.5791015625, 1950.841796875, 1, 614.158203125)
    Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
    sino_360_to_180 (httomolibgpu)
     0%|          | 0/5 [00:00<?, ?block/s]
Traceback (most recent call last):
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/dls/science/users/twi18192/httomo/httomo/__main__.py", line 4, in <module>
    main()
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/dls/science/users/twi18192/httomo/httomo/cli.py", line 205, in run
    runner.execute()
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 63, in execute
    self._execute_section(section, i)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 131, in _execute_section
    block = self._execute_section_block(section, block)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 185, in _execute_section_block
    block = self._execute_method(method, block)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 217, in _execute_method
    block = method.execute(block)
  File "/dls/science/users/twi18192/httomo/httomo/method_wrappers/generic.py", line 292, in execute
    block = self._run_method(block, args)
  File "/dls/science/users/twi18192/httomo/httomo/method_wrappers/generic.py", line 302, in _run_method
    ret = self._method(**args)
  File "/dls/science/users/twi18192/httomolibgpu/httomolibgpu/misc/morph.py", line 60, in sino_360_to_180
    return __sino_360_to_180(data, overlap, rotation)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/dls/science/users/twi18192/httomolibgpu/httomolibgpu/misc/morph.py", line 100, in __sino_360_to_180
    + (weights * data[n : 2 * n, :, -overlap:])[:, :, ::-1]
  File "cupy/_core/core.pyx", line 1281, in cupy._core.core._ndarray_base.__mul__
  File "cupy/_core/_kernel.pyx", line 1347, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 645, in cupy._core._kernel._get_out_args_from_optionals
  File "cupy/_core/core.pyx", line 2779, in cupy._core.core._ndarray_init
  File "cupy/_core/core.pyx", line 237, in cupy._core.core._ndarray_base._init_fast
  File "cupy/cuda/memory.pyx", line 740, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1426, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1447, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1118, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1139, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1384, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1387, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 13,007,707,648 bytes (allocated so far: 40,688,911,872 bytes).
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[55546,1],0]
  Exit code:    1
--------------------------------------------------------------------------

As sino_360_to_180 is the only method in its section, only its memory estimator is determining the max slices for its section, which implies that its memory estimation must be incorrect.

@yousefmoazzam yousefmoazzam added the bug Something isn't working label May 31, 2024
@yousefmoazzam
Copy link
Collaborator Author

yousefmoazzam commented May 31, 2024

I found some info in an issue in httomolibgpu that could be helpful for addressing this problem (if the shape of the stitched sinogram is calculateable, then the number of bytes in a single sinogram can be calculated from the stitched sinogram shape + data type).

Originally posted in DiamondLightSource/httomolibgpu#107 (comment)

From playing with the test 360 data, it seems that the output shape of the stitching method is able to be determined in advance by the overlap value that is outputted by the find_center_360 method and passed to the stitching method, doing something like the following:

stitched_sino_width = original_sino_width * 2 - math.ceil(overlap)

where original_sino_width is the width of the original 360 sinogram, and overlap is the overlap value produced by
the find_center_360 method.

So in principle, I think the correct shape of the output of the stitching method could be returned by its memory
estimator.

@yousefmoazzam
Copy link
Collaborator Author

yousefmoazzam commented May 31, 2024

One possible way to achieve the above would be:

  • change output_dims_change: False to output_dims_change: True
  • change - multipliers: [2.2] to - multipliers: [None]
  • change - methods: [direct] to - methods: [module]
  • provide a memory estimator function for the method

For reference, the current methods database entry for sino_360_to_180 is the following:

sino_360_to_180:
pattern: sinogram
output_dims_change: False
implementation: gpu_cupy
save_result_default: False
memory_gpu:
- datasets: [tomo]
- multipliers: [2.2]
- methods: [direct]

@yousefmoazzam yousefmoazzam self-assigned this May 31, 2024
@dkazanc
Copy link
Collaborator

dkazanc commented Jun 4, 2024

thx, looks like the dedicated memory estimator for that function is the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants