Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter sweep functionality #380

Merged
merged 33 commits into from
Jul 18, 2024
Merged

Add parameter sweep functionality #380

merged 33 commits into from
Jul 18, 2024

Conversation

yousefmoazzam
Copy link
Collaborator

@yousefmoazzam yousefmoazzam commented Jul 5, 2024

Attempt to fix #362

Main changes:

  • add ParamSweepWriter + ParamSweepReader for writing + reading parameter sweep results using blocks
  • add Stages to group methods that are before the sweep, in the sweep, and after the sweep
  • add ParamSweepRunner to orchestrate the parameter sweep run
  • add SideOutputManager to separate side output logic from runner
  • refactor DataSetBlock and introduce three interfaces that describe the distinct sets of behaviour that a block needs to be processable by an implementor of MethodWrapper (see commit message of 4123373 for more info)
  • add ParamSweepBlock to hold data during a parameter sweep run
  • modify run command to choose between executing high-throughput or param sweep run

Things left to do:

  • provide a way to do parameter sweep runs from CLI
  • add back in the modified loader that can read the !Sweep and !SweepRange YAML "tags"
  • add back in the YAML checker functionality to validate YAML pipeline files with the !Sweep and !SweepRange tags
  • configure logger for parameter sweep run

Acceptance criteria checklist

  • all tests pass
  • adequate documentation provided on how to use the parameter sweep functionality, what output it produces (ie, middle slices of the various sweep results) , etc
  • performance of parameter sweep run is acceptable

@yousefmoazzam yousefmoazzam force-pushed the param-sweep-runner branch 4 times, most recently from 3195c7f to ee5d7af Compare July 8, 2024 15:45
@yousefmoazzam
Copy link
Collaborator Author

yousefmoazzam commented Jul 8, 2024

For an example of how the logging looks so far (I've only added the basics), with the following pipeline (sweeps over 10 center values in the recon):

- method: standard_tomo
  module_path: httomo.data.hdf.loaders
  parameters:
    name: tomo
    data_path: entry1/tomo_entry/data/data
    image_key_path: entry1/tomo_entry/instrument/detector/image_key
    rotation_angles:
      data_path: /entry1/tomo_entry/data/rotation_angle
    dimension: 1
    pad: 0
    preview:
      detector_y:
        start: 50
        stop: 57
- method: normalize
  module_path: httomolibgpu.prep.normalize
  parameters:
    cutoff: 10.0
    minus_log: true
    nonnegativity: false
    remove_nans: false
- method: paganin_filter_tomopy
  module_path: httomolibgpu.prep.phase
  parameters:
    pixel_size: 0.0001
    dist: 50.0
    energy: 53.0
    alpha: 0.001
- method: remove_all_stripe
  module_path: httomolibgpu.prep.stripe
  parameters:
    snr: 3.0
    la_size: 61
    sm_size: 21
    dim: 1
- method: FBP
  module_path: httomolibgpu.recon.algorithm
  save_result: False
  parameters:
    center: !SweepRange
      start: 10
      stop: 20
      step: 1
    filter_freq_cutoff: 0.6
    recon_size: null
    recon_mask_radius: null
- method: save_to_images
  module_path: httomolib.misc.images
  parameters:
    subfolder_name: images
    axis: 1
    file_format: tif
    bits: 8
    perc_range_min: 0.0
    perc_range_max: 100.0
    jpeg_quality: 95
    offset: 0
    asynchronous: true

the following terminal output is produced:

See the full log file at: /dls/tmp/twi18192/sweep-tests/08-07-2024_16_35_08_output/user.log
Loading data with shape (1801, 7, 2560)
Running data_reducer (httomolib)
Running normalize (httomolibgpu)
Running paganin_filter_tomopy (httomolibgpu)
Running remove_all_stripe (httomolibgpu)
Running FBP (httomolibgpu)
    Parameter sweep over 10 values of parameter: center
      0%|          | 0/10 [00:00<?, ?value/s, center=10]
     10%|#         | 1/10 [00:02<00:21,  2.36s/value, center=11]
     20%|##        | 2/10 [00:03<00:12,  1.51s/value, center=12]
     30%|###       | 3/10 [00:04<00:08,  1.25s/value, center=13]
     40%|####      | 4/10 [00:05<00:06,  1.12s/value, center=14]
     50%|#####     | 5/10 [00:06<00:05,  1.04s/value, center=15]
     60%|######    | 6/10 [00:06<00:03,  1.00value/s, center=16]
     70%|#######   | 7/10 [00:07<00:02,  1.03value/s, center=17]
     80%|########  | 8/10 [00:08<00:01,  1.05value/s, center=18]
     90%|######### | 9/10 [00:09<00:00,  1.06value/s, center=19]
    Finished parameter sweep
Running save_to_images (httomolib)
Pipeline finished. Took 32.329s

Feel free to make any suggestions on tweaks/additions for improvement 🙂

@yousefmoazzam yousefmoazzam force-pushed the param-sweep-runner branch 2 times, most recently from 8c6f82a to e4cf022 Compare July 9, 2024 09:11
@yousefmoazzam yousefmoazzam marked this pull request as ready for review July 9, 2024 15:33
Copy link
Contributor

@team-gpu team-gpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea re factoring out the various block interfaces into smaller protocols, etc.

Is there any chance to re-use more of TaskRunner in ParameterSweepRunner somehow? Although it's not too much, there is a good bit of overlap. What do you think?

httomo/cli.py Outdated Show resolved Hide resolved
Copy link
Contributor

@team-gpu team-gpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go for me.

httomo/block_interfaces.py Outdated Show resolved Hide resolved
Implements the simple property getters.
The motivation behind this change is that it would be useful to be able
to use method wrapper objects (implementors of `MethodWrapper`) to
execute methods in both:
- the high throughput runner `TaskRunner`
- the parameter sweep runner `ParamSweepRunner`

The `DataSetBlock` class contains code that is mostly compatible with a
parameter sweep run. However, one incompatibility is the
`DataSetBlock.data.setter()` method, which places a constraint on the
data involving the slicing dimension of the data. In parameter sweep
runs there is only a single process being run, and so the concept of
slicing dimension isn't relevant.

Given that `DataSetBlock` contains a fair amount of logic that *is*
usable in a parameter sweep run, after various iterations of testing out
what could work well, the logic in `DataSetBlock` that is usable across
both high throughput runs and parameter sweep runs has been extracted
into a separate `BaseBlock` class.

A high-level overview of the organisation of block-related functionality
now is:
- the various functionalites needed for a block type to be processable
  by implementors of `MethodWrapper` have been organised into three
  separate protocols: `BlockData`, `BlockTransfer`, and `BlockIndexing`
- `BaseBlock` provides "typical implementations" for `BlockData` and
  `BlockTransfer`, which are reused by `DataSetBlock` via inheritance
- thus, `BaseBlock` contains the code for "typical" implementations of
  methods that are common to blocks across both high throughput runs and
  parmeter sweep runs
- `DataSetBlock` contains only the code specific to processing blocks in
  high throughput runs
- in the future, the "typical implementations" contained in `BaseBlock`
  can be reused by a new block type that will be used in parameter sweep
  runs
… + no of sweep

Before, the array to hold all sweep results was inferred from the
`single_shape` value that was given when the writer object was first
created. This had the implicit assumption that the shape of a single
sweep result would be known at the time the writer object is created.

This assumption holds when the method being executed in the sweep
doesn't change the shape of the data during processing (most methods
adhere to this). However, this assumption doesn't hold when the method
executed in the sweep does change the shape of the data during
processing (for example, reconstruction methods).

Therefore, the writer now infers the shape of a single sweep result when
the first block to be written has been given. This can then can be used
to infer the shape of the array to hold all sweep results for both cases
(when the method being executed in the sweep does and doesn't change the
shape of the data).
Instead of creating multiple copies of the same wrapper with different
values for the parameter to sweep over, use a single wrapper and update
the parameters before each execution to reflect the new sweep value to
be used.
Instead of requiring that the caller of the sweep runner creates the
`Stages` object, have the sweep runner generate that itself from the
pipeline object.
By default, the YAML loader used by the UI layer to load YAML pipeline
files is now the modified version of `yaml.SafeLoader` that additionally
handles the `!Sweep` and `!SweepRange` tags.
@yousefmoazzam yousefmoazzam merged commit 2ec4f58 into main Jul 18, 2024
2 of 4 checks passed
@yousefmoazzam yousefmoazzam deleted the param-sweep-runner branch July 18, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sweep functionality
2 participants