Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #1629 improve API for use with dask #406

Merged
merged 10 commits into from
Nov 20, 2023
Merged

fix: #1629 improve API for use with dask #406

merged 10 commits into from
Nov 20, 2023

Conversation

raylim
Copy link
Contributor

@raylim raylim commented Oct 31, 2023

This makes the Slide Manifest (from slide_etl) the main data frame of record for a project. Additional columns are added for annotations, and the tiles_url column is modified when tiles are generated, tissue is detected, and tiles are saved in the h5 store. Dask is used at the slide and tile level with batches for parallelizing when possible.

Currently, the tile tissue inference doesn't work with Dask, as we need to use a Dask-ML compatible library, i.e., Skorch.

We also need some additional unit tests for the API functions.

@raylim raylim requested a review from armaank November 3, 2023 20:45
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be passing - the property just might have a different name now (properties.tiffslide.AppMag or something like that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this locally, and there doesn't appear to be a property name change. This may be a weird CI-related bug.

@@ -274,6 +230,29 @@ def get_slide(self, url, project_name="", comment="") -> Slide:
# def estimate_stain(self, slide):


def __slide_etl(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it standard to named overloaded functions with two underscores? I think it might be _slide_etl instead of __slide_etl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here was to create some private functions that are not intended for use by the end-user. AFAICT, the convention is to use one or two underscores and not include docstrings (just regular comments).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think we use just one underscore elsewhere in the codebase, so I think we should keep it consistent and just use one (_ ) instead of two __)

with TiffSlide(slide_urlpath) as slide:
return [(x.address, get_array_from_tile(x, slide=slide)) for x in iterator]

chunks = db.from_sequence(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of dask bags

@armaank
Copy link
Collaborator

armaank commented Nov 14, 2023

Also, do you know if this resolves #407 ?

@raylim
Copy link
Contributor Author

raylim commented Nov 14, 2023

Also, do you know if this resolves #407 ?

I'm not sure. I would have to test it with the same slides that Darin used.

@armaank
Copy link
Collaborator

armaank commented Nov 17, 2023

Great, thanks for making those changes!

@raylim raylim merged commit 106fece into dev Nov 20, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants