-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: #1629 improve API for use with dask #406
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be passing - the property just might have a different name now (properties.tiffslide.AppMag
or something like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally, and there doesn't appear to be a property name change. This may be a weird CI-related bug.
@@ -274,6 +230,29 @@ def get_slide(self, url, project_name="", comment="") -> Slide: | |||
# def estimate_stain(self, slide): | |||
|
|||
|
|||
def __slide_etl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it standard to named overloaded functions with two underscores? I think it might be _slide_etl
instead of __slide_etl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here was to create some private functions that are not intended for use by the end-user. AFAICT, the convention is to use one or two underscores and not include docstrings (just regular comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I think we use just one underscore elsewhere in the codebase, so I think we should keep it consistent and just use one (_
) instead of two __
)
with TiffSlide(slide_urlpath) as slide: | ||
return [(x.address, get_array_from_tile(x, slide=slide)) for x in iterator] | ||
|
||
chunks = db.from_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of dask bags
Also, do you know if this resolves #407 ? |
I'm not sure. I would have to test it with the same slides that Darin used. |
Great, thanks for making those changes! |
This makes the Slide Manifest (from slide_etl) the main data frame of record for a project. Additional columns are added for annotations, and the tiles_url column is modified when tiles are generated, tissue is detected, and tiles are saved in the h5 store. Dask is used at the slide and tile level with batches for parallelizing when possible.
Currently, the tile tissue inference doesn't work with Dask, as we need to use a Dask-ML compatible library, i.e., Skorch.
We also need some additional unit tests for the API functions.