-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation mask file size validator #62
Draft
sunset666
wants to merge
15
commits into
devel
Choose a base branch
from
sunset666/segmentation_mask_imagesize_validation
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 9 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
48c17f8
New plugin that validates image size of a segmentation_mask and compa…
sunset666 e145815
Update to the reference for parent_id
sunset666 d785d57
Linting
sunset666 f266537
Adding pycharm environment files to gitignore
sunset666 e8612d7
More Linting
sunset666 39fd223
More Linting
sunset666 5f8b76b
More Linting -> isort
sunset666 e46b47a
Updating to apply suggestions.
sunset666 e5098a2
Linting
sunset666 04c573b
Bugfix assertion should be equal to not different than
sunset666 936e3ba
Bugfix import path and assay_type naming
sunset666 5735131
Bugfix token attribute.
sunset666 38f93d8
Refactoring to pass tests and not conflict with ingest_pipeline.
sunset666 0502205
Linting
sunset666 7d8d06a
Typo on file name
sunset666 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -130,3 +130,7 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# PyCharm | ||
.idea/ | ||
|
87 changes: 87 additions & 0 deletions
87
src/ingest_validation_tests/segmentation_mask_imagesize_validation.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
from pathlib import Path | ||
from typing import List, Optional, Union | ||
|
||
import tifffile | ||
import xmlschema | ||
from ingest_validation_tools.plugin_validator import Validator | ||
from utils import GetParentData | ||
|
||
|
||
def get_ometiff_size(file) -> Union[str, dict]: | ||
try: | ||
tf = tifffile.TiffFile(file) | ||
xml_document = xmlschema.XmlDocument(tf.ome_metadata) | ||
if xml_document.schema and not xml_document.schema.is_valid(xml_document): | ||
return f"{file} is not a valid OME.TIFF file" | ||
except Exception as excp: | ||
return f"{file} is not a valid OME.TIFF file: {excp}" | ||
xml_image_data = xml_document.schema.to_dict(xml_document).get("Image")[0].get("Pixels") | ||
try: | ||
rst = { | ||
"X": xml_image_data.get("@PhysicalSizeX"), | ||
"XUnits": xml_image_data.get("@PhysicalSizeXUnits"), | ||
"Y": xml_image_data.get("@PhysicalSizeY"), | ||
"YUnits": xml_image_data.get("@PhysicalSizeYUnits"), | ||
"Z": xml_image_data.get("@PhysicalSizeZ"), | ||
"ZUnits": xml_image_data.get("@PhysicalSizeZUnits"), | ||
} | ||
return rst | ||
except Exception as excp: | ||
return f"{file} is not a valid OME.TIFF file: {excp}" | ||
|
||
|
||
class ImageSizeValidator(Validator): | ||
description = "Check dataset and parent image size so they can be matched in the visualization" | ||
cost = 1.0 | ||
version = "1.0" | ||
required = "segmentation_mask" | ||
files_to_find = [ | ||
"**/*.ome.tif", | ||
"**/*.ome.tiff", | ||
"**/*.OME.TIFF", | ||
"**/*.OME.TIF", | ||
] | ||
|
||
def collect_errors(self, **kwargs) -> List[Optional[str]]: | ||
del kwargs | ||
if self.required not in self.contains and self.assay_type.lower() != self.required: | ||
return [] # We only test Segmentation Masks | ||
files_tested = None | ||
output = [] | ||
filenames_to_test = [] | ||
parent_filenames_to_test = [] | ||
try: | ||
for row in self.metadata_tsv.rows: | ||
data_path = Path(row["data_path"]) | ||
if not data_path.is_absolute(): | ||
data_path = Path(self.paths[0]).parent / data_path | ||
|
||
for glob_expr in self.files_to_find: | ||
for file in data_path.glob(glob_expr): | ||
filenames_to_test.append(file) | ||
|
||
for file in Path( | ||
GetParentData( | ||
row["parent_dataset_id"], self.globus_token, self.app_context | ||
).get_path() | ||
).glob(glob_expr): | ||
parent_filenames_to_test.append(file) | ||
|
||
assert len(filenames_to_test) != 1, "Too many or too few files Mask" | ||
assert len(parent_filenames_to_test) != 1, "Too many or too few files Base Images" | ||
|
||
segmentation_mask_size = get_ometiff_size(filenames_to_test[0]) | ||
base_image_size = get_ometiff_size(parent_filenames_to_test[0]) | ||
assert ( | ||
segmentation_mask_size == base_image_size | ||
), "Files and base image size do not match" | ||
|
||
except AssertionError as exep: | ||
output.append(str(exep)) | ||
|
||
if output: | ||
return output | ||
elif files_tested: | ||
return [None] | ||
else: | ||
return [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import requests | ||
|
||
|
||
class GetParentData: | ||
def __init__(self, hubmap_id, globus_token, app_context): | ||
self.hubmap_id = hubmap_id | ||
self.token = globus_token | ||
self.app_context = app_context | ||
|
||
def __get_uuid(self) -> None: | ||
url = self.app_context.get("uuid_url") + self.hubmap_id | ||
headers = self.app_context.get("request_headers", {}) | ||
headers({"Authorization": "Bearer " + self.token}) | ||
try: | ||
response = requests.get(url, headers=headers) | ||
response.raise_for_status() | ||
self.uuid = response.json().get("uuid") | ||
except requests.exceptions.HTTPError as err: | ||
self.uuid = None | ||
print(f"Error: {err}") | ||
|
||
def get_path(self) -> str: | ||
self.__get_uuid() | ||
if self.uuid is not None: | ||
url = ( | ||
self.app_context.get("ingest_url") | ||
+ "datasets/" | ||
+ self.uuid | ||
+ "/file-system-abs-path" | ||
) | ||
headers = self.app_context.get("request_headers", {}) | ||
try: | ||
response = requests.get(url, headers=headers) | ||
response.raise_for_status() | ||
return response.json().get("path") | ||
except requests.exceptions.HTTPError as err: | ||
print(f"Error: {err}") | ||
return "" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this again, I'm actually not sure how many files we are expecting--more than 1? I think I misread this as
== 1
before, which I thought made sense.Would it be reasonable instead to do something like
assert len(filenames_to_test) == len(parent_filenames_to_test), "Mismatched number of files in dataset and parent_dataset directories."
? I might still be misunderstanding the intent here though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I thought I had it "==", the idea is that a segmask can only happen if the parent dataset is one image, more than 1 image is not allowed. Updated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I read originally so we're both losing it apparently haha. Okay good now I think!