Overestimation of max slices for `sino_360_to_180` #354

yousefmoazzam · 2024-05-31T11:12:13Z

Setup:

hopper node
1 GPU (Nvidia A100)
1 MPI process
360 data is /mnt/gpfs03/scratch/data/imaging/tomography/tmp/testdata/360/112482.nxs

The following pipeline:

- method: standard_tomo
  module_path: httomo.data.hdf.loaders
  parameters:
    name: tomo
    data_path: entry1/tomo_entry/data/data
    image_key_path: entry1/tomo_entry/instrument/detector/image_key
    rotation_angles:
      data_path: /entry1/tomo_entry/data/rotation_angle
- method: find_center_360
  module_path: httomolibgpu.recon.rotation
  parameters:
    ind: mid
    win_width: 10
    side: null
    denoise: true
    norm: false
    use_overlap: false
  id: centering
  side_outputs:
    cor: centre_of_rotation
    overlap: overlap
    side: side
    overlap_position: overlap_position
- method: normalize
  module_path: httomolibgpu.prep.normalize
  parameters:
    cutoff: 10.0
    minus_log: true
    nonnegativity: false
    remove_nans: false
- method: sino_360_to_180
  module_path: httomolibgpu.misc.morph
  parameters:
    overlap: ${{centering.side_outputs.overlap}}
    rotation: right

produces a CUDA OOM error on the first block being processed by the sino_360_to_180 method:

(/dls/science/users/twi18192/conda-envs/httomo) [twi18192@cs05r-sc-hop01-02 httomo (fourier)]$ mpirun -n 1 python -m httomo run /mnt/gpfs03/scratch/data/imaging/tomography/tmp/testdata/360/112482.nxs /dls/science
/users/twi18192/dls_pipelines/pipelines/bench_recons/bench_recon_gridrec_cpu360.yaml /mnt/gpfs03/scratch/d
ata/imaging/tomography/twi18192/out/
2024-05-31 12:07:42.684 | DEBUG    | httomo.utils:<module>:17 - CuPy is installed
Pipeline has been separated into 2 sections
See the full log file at: /mnt/gpfs03/scratch/data/imaging/tomography/twi18192/out/31-05-2024_12_07_43_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 27448.20ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_360 (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/4 [00:07<?, ?block/s]
    25%|##5       | 1/4 [00:32<01:27, 29.10s/block]
    50%|#####     | 2/4 [00:52<00:47, 23.90s/block]
    75%|#######5  | 3/4 [01:07<00:20, 20.93s/block]
    --->The center of rotation is (1583.5791015625, 1950.841796875, 1, 614.158203125)
    Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
    sino_360_to_180 (httomolibgpu)
     0%|          | 0/5 [00:00<?, ?block/s]
Traceback (most recent call last):
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/dls/science/users/twi18192/httomo/httomo/__main__.py", line 4, in <module>
    main()
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/dls/science/users/twi18192/httomo/httomo/cli.py", line 205, in run
    runner.execute()
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 63, in execute
    self._execute_section(section, i)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 131, in _execute_section
    block = self._execute_section_block(section, block)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 185, in _execute_section_block
    block = self._execute_method(method, block)
  File "/dls/science/users/twi18192/httomo/httomo/runner/task_runner.py", line 217, in _execute_method
    block = method.execute(block)
  File "/dls/science/users/twi18192/httomo/httomo/method_wrappers/generic.py", line 292, in execute
    block = self._run_method(block, args)
  File "/dls/science/users/twi18192/httomo/httomo/method_wrappers/generic.py", line 302, in _run_method
    ret = self._method(**args)
  File "/dls/science/users/twi18192/httomolibgpu/httomolibgpu/misc/morph.py", line 60, in sino_360_to_180
    return __sino_360_to_180(data, overlap, rotation)
  File "/dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/dls/science/users/twi18192/httomolibgpu/httomolibgpu/misc/morph.py", line 100, in __sino_360_to_180
    + (weights * data[n : 2 * n, :, -overlap:])[:, :, ::-1]
  File "cupy/_core/core.pyx", line 1281, in cupy._core.core._ndarray_base.__mul__
  File "cupy/_core/_kernel.pyx", line 1347, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 645, in cupy._core._kernel._get_out_args_from_optionals
  File "cupy/_core/core.pyx", line 2779, in cupy._core.core._ndarray_init
  File "cupy/_core/core.pyx", line 237, in cupy._core.core._ndarray_base._init_fast
  File "cupy/cuda/memory.pyx", line 740, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1426, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1447, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1118, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1139, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1384, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1387, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 13,007,707,648 bytes (allocated so far: 40,688,911,872 bytes).
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[55546,1],0]
  Exit code:    1
--------------------------------------------------------------------------

As sino_360_to_180 is the only method in its section, only its memory estimator is determining the max slices for its section, which implies that its memory estimation must be incorrect.

The text was updated successfully, but these errors were encountered:

yousefmoazzam · 2024-05-31T11:21:13Z

I found some info in an issue in httomolibgpu that could be helpful for addressing this problem (if the shape of the stitched sinogram is calculateable, then the number of bytes in a single sinogram can be calculated from the stitched sinogram shape + data type).

Originally posted in DiamondLightSource/httomolibgpu#107 (comment)

From playing with the test 360 data, it seems that the output shape of the stitching method is able to be determined in advance by the overlap value that is outputted by the find_center_360 method and passed to the stitching method, doing something like the following:
stitched_sino_width = original_sino_width * 2 - math.ceil(overlap)
where original_sino_width is the width of the original 360 sinogram, and overlap is the overlap value produced by
the find_center_360 method.

So in principle, I think the correct shape of the output of the stitching method could be returned by its memory
estimator.

yousefmoazzam · 2024-05-31T11:41:05Z

One possible way to achieve the above would be:

change output_dims_change: False to output_dims_change: True
change - multipliers: [2.2] to - multipliers: [None]
change - methods: [direct] to - methods: [module]
provide a memory estimator function for the method

For reference, the current methods database entry for sino_360_to_180 is the following:

httomo/httomo/methods_database/packages/external/httomolibgpu/httomolibgpu.yaml

Lines 22 to 30 in 4429711

    
           sino_360_to_180: 
        
             pattern: sinogram 
        
             output_dims_change: False 
        
             implementation: gpu_cupy 
        
             save_result_default: False 
        
             memory_gpu: 
        
               - datasets: [tomo] 
        
               - multipliers: [2.2] 
        
               - methods: [direct]

dkazanc · 2024-06-04T13:11:49Z

thx, looks like the dedicated memory estimator for that function is the way to go.

yousefmoazzam added the bug Something isn't working label May 31, 2024

yousefmoazzam self-assigned this May 31, 2024

yousefmoazzam mentioned this issue Jun 11, 2024

Improve accuracy of 360 stitching memory estimation #363

Merged

2 tasks

dkazanc closed this as completed in #363 Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overestimation of max slices for `sino_360_to_180` #354

Overestimation of max slices for `sino_360_to_180` #354

yousefmoazzam commented May 31, 2024

yousefmoazzam commented May 31, 2024 •

edited

Loading

yousefmoazzam commented May 31, 2024 •

edited

Loading

dkazanc commented Jun 4, 2024

Overestimation of max slices for sino_360_to_180 #354

Overestimation of max slices for sino_360_to_180 #354

Comments

yousefmoazzam commented May 31, 2024

yousefmoazzam commented May 31, 2024 • edited Loading

yousefmoazzam commented May 31, 2024 • edited Loading

dkazanc commented Jun 4, 2024

Overestimation of max slices for `sino_360_to_180` #354

Overestimation of max slices for `sino_360_to_180` #354

yousefmoazzam commented May 31, 2024 •

edited

Loading

yousefmoazzam commented May 31, 2024 •

edited

Loading