-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP for morpohology module #866
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #866 +/- ##
==========================================
- Coverage 69.99% 69.52% -0.47%
==========================================
Files 39 39
Lines 5532 5667 +135
Branches 1037 1063 +26
==========================================
+ Hits 3872 3940 +68
- Misses 1367 1429 +62
- Partials 293 298 +5
|
for more information, see https://pre-commit.ci
I had to remove the code that called the label function from skimage as it was creating additional labels that we cannot trace back to the original labels. |
Also fix some failing code checks
for more information, see https://pre-commit.ci
…into feature/add_morphology_toolbox
for more information, see https://pre-commit.ci
@@ -3,7 +3,7 @@ | |||
from __future__ import annotations | |||
|
|||
from squidpy.im._container import ImageContainer | |||
from squidpy.im._feature import calculate_image_features | |||
from squidpy.im._feature import calculate_image_features, quantify_morphology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @npeschke @timtreis was just taking a look at this, was wondering why the duplication of the function? It seems that quantify_morphology
does something very similar to calculate_image_features
. Do you plan to discontinue the latter? wouldn't it be better to adapt the latter to use spatialdata + adapt? thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh not sure yet what the optimal solution will be. There is clear redundancy and a strong overlap, but when I wrote the original MVP, it didn't clearly fit with the current calculate_image_features
. On the one hand, some of these features don't need an image but only the label, but then also the structure of the function was going to be quite different.
Even if we merge them into the calculate_image_features function, the parameters of that function would have to change. So yeah, not fully sure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally wanted to have sth that takes an arbitrary callable with a region_props like footprint so we could also inject f.e. some HugginfFace featuriser or sth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's a good point. The calculate_image_features
does support arbitrary functions, see https://squidpy.readthedocs.io/en/stable/api/squidpy.im.calculate_image_features.html and https://squidpy.readthedocs.io/en/stable/api/squidpy.im.calculate_image_features.html and I understand that the current implementation doesn't operate on masks (or not only on mask), but maybe it would be ok to just change the input to "image" to not be just an Image but also a Label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parallelization/out of core functionality I think it's also important. The current implementation relies on joblib parallel to iterate over spots, and understand it's not optimal (e.g. how to do it with raster labels and raster image?). But maybe a combination of that and dask, e.g. https://examples.dask.org/applications/image-processing.html could be useful. Basically I think we should strive to implement scalability in time/memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this, not maintained but maybe some ideas could be reused
https://github.com/jrussell25/dask-regionprops/blob/main/dask_regionprops/regionprops.py
EDIT: looking more deeply, not sure anymore 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this ties into the larger topic of moving things to the GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for context, the current implementation takes 20 seconds on my machine to calculate all available features of the MIBI-TOF dataset. So parallelization might not need to be that urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be if we add more, potentially more expensive to compute, features and analyse datasets like Xenium with 100k+ cells ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @npeschke , thanks for sharing the time. If the mibi-tof dataset you are referring to is the one from squidpy, that is a toy dataset that does not really recapitulate real data complexity and size. I'd be curious to see the performance on a e.g. xenium dataset.
# if we didn't get any properties, we'll do the bare minimum | ||
props = ["label"] | ||
|
||
np_rgb_image = image_element.values.transpose(1, 2, 0) # (c, y, x) -> (y, x, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like, stuff like this doesn't work with real data. Everything needs to be either lazy, or looped/parallelized
MRE:
Notes
circularity
is a non-skimage method but internal method (we need a library of those, can "steal" from CellProfiler + X)my_func
is an external method that gets fed toskimage.measure.regionprops
. We need to provide doc on how to make such method. They take in a mask of the respective cell as well as a (potentially multi-channel) intensity image (bothnp.ndarrays
).squidpy-notebooks
From discussion on 2024-08-13
my_func
wrapper for model inference, f.e. from some HF-model?