Skip to content

Conversation

@matham
Copy link
Contributor

@matham matham commented Jun 7, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

Right now we pass the same parameters from napari and to various functions. But these parameters have inconsistent or no docs.

What does this PR do?

  1. Docs
    1. Adds similarly worded docs for parameters to all main functions, napari, and setup_filters.
    2. Makes the docs a bit more complete and readable.
    3. Re-ordered the napari detection parameters order so it flows in the order of how filtering happens. I.e. 2d filtering -> 3d filtering -> cluster splitting.
  2. A few parameters (voxel_sizes, n_sds_above_mean_thresh in napari) was set as ints, but they should be floats.
  3. Split parameters
    1. Exposed the split_ parameters in all the main functions, but not in napari.
    2. The normal parameters are all in microns, but the split_ parameters were in pixels. I changed it so it's also in microns by multiplying by the default voxel size. But the default values for the parameters are unchanged.
    3. Exposed n_splitting_iter with same default to the main functions. Previously it was hard coded in setup_filters.
    4. Removed split_soma_diameter because it was redundant with soma_spread_factor and not used. AFAICT, soma_diameter is only used during 2d filtering, not during 3d filtering. Splitting only uses 3d filtering. The only place during splitting split_soma_diameter would be used is to calculate the volume of a cell to decide whether we need to split it. But, we already have soma_spread_factor that multiplies soma_diameter not split_soma_diameter.
  4. Cleaned up batch_size into detection_batch_size and classification_batch_size. The former defaults to 1 in napari and None in main. Which means 1 for GPU and 4 for CPU. The latter default is unchanged. E.g. on my machine with very large planes, a batch of just 1 detection planes can fit in the 10GB GPU, but 128+ classification cubes can fit.

References

#532 and #464 (comment).

How has this PR been tested?

Locally and with data.

Is this a breaking change?

Only the batch_size parameter name is renamed. Everything else is effectively the same.

Does this PR require an update to the documentation?

No.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality (unit & integration)
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@matham matham changed the title Add consisten parameter docs to the main functions and napari Add consistent parameter docs to the main functions and napari Jun 7, 2025
This was referenced Jun 7, 2025
@matham
Copy link
Contributor Author

matham commented Jun 7, 2025

I'm not sure why the codecov check is failing?

Copy link
Member

@alessandrofelder alessandrofelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of the consistency introduced here - thanks a lot @matham

I have followed the changes and think they make sense. I will run a sanity check on local data overnight, but assuming that passes and you respond to my minor comments/suggestions, this is ready to be merged

PS

I'm not sure why the codecov check is failing?

Maybe because we're adding extra lines through formatting etc (and maybe docstrings too?)... too small a change to warrant deeper investigation, I think, especially since patch coverage is 100%

Maximum size of a cluster in physical units.
The expected in-plane (xy) soma diameter (microns).
max_cluster_size : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_cluster_size : int
max_cluster_size : float

I think, for consistency with typehint at least (don't think it matters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the sig to int, because it's int everywhere else and it's number of voxels so should be int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's number of voxels

So that means the docstring should say number of voxels instead of cubic um right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh it is um so I changed it to float everywhere..

Size of the sigma for the log filter.
Gaussian filter width (as a fraction of soma diameter) used during
2d in-plane filtering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2d in-plane filtering.
2d in-plane Laplacian of Gaussian filtering.

Nitpicky, but may help explain the variable name

Comment on lines 112 to 113
for all the filters. Tune to maximize memory usage without running
out. Check your GPU/CPU memory to verify it's not full.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for all the filters. Tune to maximize memory usage without running
out. Check your GPU/CPU memory to verify it's not full.
for all the filters. For performance-critical applications, tune to maximize memory usage without running
out. Check your GPU/CPU memory to verify it's not full.

This suggestion to make clear that tuning is not a requirement.

Also, should this default to 1, as suggested by the old docstring (but not the code 😅 )?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it everywhere. For GPU it does default to 1 in the code!?

-------
List[Cell]
List of detected cells.
List of detected potential cells and artifacts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List of detected potential cells and artifacts.
List of detected cell candidates.

I think this is the wording we use in the paper and elsewhere.

Comment on lines 118 to 123
split_ball_xy_size: int
Similar to `ball_xy_size`, except the value to use for the 3d
filter during cluster splitting.
split_ball_z_size: int
Similar to `ball_z_size`, except the value to use for the 3d filter
during cluster splitting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
split_ball_xy_size: int
Similar to `ball_xy_size`, except the value to use for the 3d
filter during cluster splitting.
split_ball_z_size: int
Similar to `ball_z_size`, except the value to use for the 3d filter
during cluster splitting.
split_ball_xy_size: float
Similar to `ball_xy_size`, except the value (in microns) to use for the 3d
filter during cluster splitting.
split_ball_z_size: float
Similar to `ball_z_size`, except the value (in microns) to use for the 3d filter
during cluster splitting.

Should now also be floats, as we use them as micron quantities now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed everywhere.

Comment on lines 107 to 112
split_ball_xy_size: int
Similar to `ball_xy_size`, except the value to use for the 3d
filter during cluster splitting.
split_ball_z_size: int
Similar to `ball_z_size`, except the value to use for the 3d filter
during cluster splitting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make these have float typehints too?

Comment on lines +84 to +87
ball_xy_size : float
3d filter's in-plane (xy) filter ball size (microns).
ball_z_size : float
3d filter's axial (z) filter ball size (microns).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make typehints in function signature match these typehints?

The expected in-plane (xy) soma diameter (microns)
log_sigma_size : float
Gaussian filter width (as a fraction of soma diameter) used during
2d in-plane filtering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2d in-plane filtering
2d in-plane Laplacian of Gaussian filtering

batch_size=dict(value=cls.defaults()["batch_size"]),
classification_batch_size=dict(
value=cls.defaults()["classification_batch_size"],
label="Batch size",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
label="Batch size",
label="Batch size (classification)",

Helpful to be explicit?

max_workers: int
The number of sub-processes to use for data loading / processing.
Defaults to 8.
pin_memory: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably this will come in a follow up PR shortly? (it's not currently part of the function signature)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many PRs at once... It leaked over :D

@matham
Copy link
Contributor Author

matham commented Jun 9, 2025

I think I addressed all of the comments, including in all the other places they would occur.

@alessandrofelder alessandrofelder self-requested a review June 10, 2025 14:07
Copy link
Member

@alessandrofelder alessandrofelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @matham - I got delayed in doing my local tests on separate data (sorry!), just as a sanity check, but I am now happy with the code and will merge when I have done the local tests (assuming they go well, which they should)

@matham
Copy link
Contributor Author

matham commented Jun 17, 2025

I moved the split parameters added to the main main function to the end, so as to make it less likely to conflict if someone called it using positional args only.

I'm wondering if we should make all of these main args, except the first few, keyword args only. So that we can change their order or add new ones in the future without compatibility issues. It's already risky calling them using positional input only, like is evidenced by brainglobe/brainglobe-workflows#150 where I converted them to keyword args. So maybe we should prevent it?

@alessandrofelder
Copy link
Member

alessandrofelder commented Jun 23, 2025

Thanks, @matham

All looks good to me now.

For our future selves, I did some one-off manual regression testing:

I confirm that I have run this code on our internal reference whole-brain dataset MS_cx_left and got 99.6% overlap of cell candidates centres with current main. Overlap was as closer than 1 pixel euclidean distance between centres, using brainglobe_utils.cells.match_cells. (~270 non-overlapping cells out of >71'000)

I also confirm that I was able to run brainmapper to successfully find the same cell candidates, so we haven't broken the API with this PR.

I'm wondering if we should make all of these main args, except the first few, keyword args only. So that we can change their order or add new ones in the future without compatibility issues. It's already risky calling them using positional input only, like is evidenced by brainglobe/brainglobe-workflows#150 where I converted them to keyword args. So maybe we should prevent it?

I've opened #550 so others can chime in, and we don't continue the discussion on a merged PR.

@alessandrofelder alessandrofelder merged commit 4224d72 into brainglobe:main Jun 23, 2025
16 checks passed
@matham matham deleted the docs branch June 23, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants