-
Notifications
You must be signed in to change notification settings - Fork 67
Add missing split params to napari and clean up docstrings #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think previously the assumption was that if entire cells were labelled, then cells dendrites would be what connects them to each other in the image, and this could be more easily eroded.
IIRC, the remaining cells are recorded at each iteration, so even if they get eroded to nothing, they still "count".
Yep, I think this is much closer to the case of the original cellfinder use cases. Generally, I think a lot of assumptions are now no longer valid now that cellfinder is being used for other types of data. Nuclear cfos LSFM data looks very different to whole-cell rabies tracing serial 2p data (the original application). We're not wedded to any of the current implementation though, so if there's a better way, that's great. |
|
So what do you think about the global/local thresholding approach? I can convert this PR into that and keep the splitting parameters out of napari, unless splitting is useful in some applications and you'd like to expose it in napari and then I can keep the PR as is and make a new PR with the threshold stuff? As I described above, the global threshold would be used to find the darkest cells, and potentially can even do what the bright tiles algorithm is meant to do. Since that can be used to eliminate any voxels that are below that threshold, which regions outside the brain are. And the local tiled threshold would help further define cells in bright regions. You'd need one, potentially two new parameters; the tile size and a second threshold value (I'm not sure if that would be needed, but perhaps tuning the tiled threshold would be helpful). And to keep the old behavior you'd just use the global one by setting tile size to zero. I'm not suggesting to drop the bright tiling algorithm that determines in/out of brain areas currently in use, but this could be an alternate way of doing the same thing, of exuding cell looking things outside the brain. |
alessandrofelder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matham - excited about these improvements.
I am finding it hard to wrap my head around all these at once, so I've opened #535 to help me structure my thoughts.
I've made some initial tiny comments while reading the code for the first time in more depth.
So what do you think about the global/local thresholding approach?
Here I was wondering whether the global/local thresholding could be done as a preprocessing step?
Generally, we are happy with changes as long as we're sure cellfinder still works well on original application data.
| n_free_cpus : int | ||
| How many CPU cores to leave free. | ||
| voxel_sizes : 3-tuple of floats | ||
| Size of your voxels in the z, y, and x dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in microns? or in pixels?
| Size of your voxels in the z, y, and x dimensions. | ||
| network_voxel_sizes : 3-tuple of floats | ||
| Size of the pre-trained network's voxels in the z, y, and x dimensions. | ||
| batch_size : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have "sensible", conservative defaults (they don't need to be perfect) especially for more computational options (maybe batch_size=1 and pin_memory=False?) because I think the prospect of having to check GPU/CPU memory and understanding what pinning memory means in this context will scare off users.
| max_workers: int | ||
| The number of sub-processes to use for data loading / processing. | ||
| Defaults to 8. | ||
| pin_memory: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be made a keyword argument?
|
This PR as is is differently doing too much at once, mainly because I started using the branch to stage all the related stuff so I can test it more easily. So I will break it off into 3+ PRs?
This seem ok? I'd love to combine 1 and 3 because 3 will git conflict with 1 since they both touch similar lines of code. But I can also brake the branch of 3 off 1 so when 1 is merged it should be ok with 3 in terms of git? |
|
Sounds like a good plan!
Yep - 1 should be straightforward to review, and I can prioritise it, so we avoid conflicts. |












Description
What is this PR
Why is this PR needed?
The main reason is that the default values for the split parameters are not always working and need to be tunable. I was trying to properly tune detection so I can test classification, and to do that I needed to also tune those unexposed parameters.
In particular this is important when there are lots of cells concentrated in a single area and so you need to set the standard deviation threshold low enough to get all of them, but then it creates clusters of them which need to be split. And 3d filtering during splitting a cluster requires slightly different values than the initial 3d filtering that created the cluster. In particular being able to run iterations of erosion of the volume can help split these clusters. And so it must be tunable.
What does this PR do?
Added the
split_parameters to napari.Removed the
split_soma_diameterbecause it is redundant withsoma_spread_factoras explained here.Changed the
split_parameters to use default values that are microns instead of voxels so it's consistent with the other parameters. This doesn't change the numbers used only the default values in the function parameters.Separated the classification and detection batch sizes because e.g. on my machine a batch of just 10 detection planes can fit in the GPU, but 128+ classification cubes can fit.
Added same / similar docstrings to all the various
mainfunctions and napari. Also re-ordered the docstrings so it matches the order.In napari, I re-ordered the detection parameters so that it flows in the order of how filtering happens. I.e. 2d filtering -> 3d filtering -> cluster splitting.
I added a file under benchmarks (
filter_debug.py) that runs through the full filter stack and saves the output generated by each stage to disk so that you can visualize what the stages do and how changing parameters affect it. Because it's hard to tune all these parameters when you only have the final output (candidate cells), which takes a bit to run and you have to guess at how these input parameters actually affect the outcome.There's already something like this for saving the output of the 3d filtering step, but having the 2d step output in particular is helpful. It would be nice to be able to visualize all these debug outputs directly in napari, but that would require a lot more work and possibly not worth the complexity. Although I do wonder how most users would approach this tuning stage without being able to see how the parameters change the output. And I suspect something like this would be invaluable when tuning for new samples.
This should also be helpful for generating images of the output for docs to show what each of the algorithm steps are doing.
References
#464 (comment)
How has this PR been tested?
Test suite plus with some samples.
Is this a breaking change?
I did drop the
split_soma_diameterparameter as explained. Plus I renamed the batch_size forclassification_batch_sizeanddetection_batch_size. I'm not sure if workflows need to be updated? I'm assuming we need to similarly expose these parameters there. Is there a single place where they are added or is there multiple places?Does this PR require an update to the documentation?
These changes should be propagated to the public docs, but this PR doesn't add them new.
Checklist:
Here's example output from:
Input
After the laplacian/gaussian filters
After 2d filter
After 3d filter
Colored based on cell, needs to split, and artifact
Same as above except the split cells have a sphere colored in