Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation mask file size validator #62

Draft
wants to merge 15 commits into
base: devel
Choose a base branch
from

Conversation

sunset666
Copy link
Contributor

Validator to check size match between segmentation mask and base image.

Copy link
Contributor

@gesinaphillips gesinaphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally understand the structure of the ome.tiffs here but comparing the pixel sizes seems to make sense. It seems like we're expecting just one ome.tiff file per data_path--if so, then that seems fine too.

Just a few comments otherwise!

src/ingest_validation_tests/utils.py Outdated Show resolved Hide resolved
if not data_path.is_absolute():
data_path = Path(self.paths[0]).parent / data_path

for glob_expr in [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consolidate this list into a class attribute or something unless there's a chance we'd want to have different patterns for mask/base images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this a bit, how does it look now?

for file in Path(
GetParentData(
row["parent_dataset_id"], self.globus_token, self.app_context
).get_path()
).glob(glob_expr):
parent_filenames_to_test.append(file)

assert len(filenames_to_test) != 1, "Too many or too few files Mask"
Copy link
Contributor

@gesinaphillips gesinaphillips Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, I'm actually not sure how many files we are expecting--more than 1? I think I misread this as == 1 before, which I thought made sense.

Would it be reasonable instead to do something like assert len(filenames_to_test) == len(parent_filenames_to_test), "Mismatched number of files in dataset and parent_dataset directories." ? I might still be misunderstanding the intent here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I thought I had it "==", the idea is that a segmask can only happen if the parent dataset is one image, more than 1 image is not allowed. Updated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I read originally so we're both losing it apparently haha. Okay good now I think!

@gesinaphillips gesinaphillips marked this pull request as ready for review September 3, 2024 18:07
@gesinaphillips gesinaphillips marked this pull request as draft September 3, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants