-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: ImageMaskInput #233
base: development
Are you sure you want to change the base?
WIP: ImageMaskInput #233
Conversation
📝 WalkthroughWalkthroughThis pull request updates the linting configuration by modifying the Changes
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
…nd remove unused imports
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #233 +/- ##
===============================================
- Coverage 65.12% 64.06% -1.06%
===============================================
Files 55 56 +1
Lines 3710 3857 +147
===============================================
+ Hits 2416 2471 +55
- Misses 1294 1386 +92 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
src/imgtools/ops/input_classes.py (5)
5-10
: Consider unifying your namedtuple usage for consistency.Currently, both
namedtuple
fromcollections
(line 5) andNamedTuple
fromtyping
(line 9) are imported. If you need stricter typing, consider switching fully to thetyping.NamedTuple
approach to enhance maintainability and type checking consistency.
30-53
: Usetime.perf_counter
instead oftime.time
for more precise timing.The current decorator relies on
time.time()
, which can be less precise on some platforms. For higher-resolution timing, consider usingtime.perf_counter()
:- start_time = time.time() ... - end_time = time.time() + start_time = time.perf_counter() ... + end_time = time.perf_counter()
55-63
: Optional: Add docstrings for enum members.While your
ImageMaskModalities
enum is straightforward, brief docstrings or comments explaining each modality combination could help new contributors quickly understand the possible modality pairs.
254-286
: Namedtuple packaging is convenient, but ensure all modality cases are handled.The
match
statement for"RTSTRUCT"
and"SEG"
is great for structural consistency. For more robust code, consider augmenting or extending the logic if other modalities will be added in the future.
520-538
: Demonstration code is fine, but consider wrapping in a test or example module.Inline demonstration within the
__main__
guard is quick and illustrative. For larger codebases, grouping such usage examples into a dedicated example/test module (or a Jupyter notebook) can aid discoverability.tests/legacy_tests/test_modalities.py (1)
30-30
: Remove redundant self-assignment.This line is effectively a no-op:
- dicom = dicom
Removing it avoids confusion for future maintainers.
🧰 Tools
🪛 Ruff (0.8.2)
30-30: Self-assignment of variable
dicom
(PLW0127)
src/imgtools/utils/imageutils.py (1)
172-188
: Vectorization can further optimize large index arrays.Replacing the list comprehension (line 187) with a NumPy-vectorized approach (like your commented-out
vectorized_transform
) may boost performance whenidxs
is large. Current implementation is readable, but for massive data, consider reintroducing a vectorized method.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/usage/Input/ImageMask.ipynb
is excluded by none and included by nonepixi.lock
is excluded by!**/*.lock
and included by none
📒 Files selected for processing (6)
config/ruff.toml
(1 hunks)src/imgtools/coretypes/box.py
(1 hunks)src/imgtools/ops/input_classes.py
(2 hunks)src/imgtools/ops/ops.py
(1 hunks)src/imgtools/utils/imageutils.py
(2 hunks)tests/legacy_tests/test_modalities.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`src/**/*.py`: Review the Python code for compliance with PE...
src/**/*.py
: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.
src/imgtools/utils/imageutils.py
src/imgtools/ops/input_classes.py
src/imgtools/coretypes/box.py
src/imgtools/ops/ops.py
`tests/**/*`: Review the test code written with Pytest. Conf...
tests/**/*
: Review the test code written with Pytest. Confirm: - Tests cover all critical functionality and edge cases. - Test descriptions clearly describe their purpose. - Pytest best practices are followed, such as proper use of fixtures. - Ensure the tests are isolated and do not have external dependencies (e.g., network calls). - Verify meaningful assertions and avoidance of redundant tests. - Test code adheres to PEP 8 style guidelines.
tests/legacy_tests/test_modalities.py
🪛 Ruff (0.8.2)
tests/legacy_tests/test_modalities.py
30-30: Self-assignment of variable dicom
(PLW0127)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Unit-Tests (windows-latest, py313)
- GitHub Check: Unit-Tests (windows-latest, py312)
- GitHub Check: Unit-Tests (windows-latest, py311)
- GitHub Check: Unit-Tests (windows-latest, py310)
- GitHub Check: Unit-Tests (macos-13, py313)
- GitHub Check: Unit-Tests (macos-13, py312)
- GitHub Check: Unit-Tests (macos-13, py311)
- GitHub Check: Unit-Tests (macos-13, py310)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py313)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (7)
src/imgtools/ops/input_classes.py (5)
25-28
: Looks good!The introduction of
BaseInput
and the newLoaderFunction
type alias is clear and aligns well with your SITK-based pipeline.
65-93
: Dataclass design is clear and well-documented.The class-level docstring effectively explains basic usage and critical parameters. No immediate concerns with naming or structure here.
113-185
: Validate multidimensional modality assumptions.In
__call__
, the code assumesself.modality_list[1]
exists (line 260). If a new enum (with just one modality) or alternative usage arises, this could potentially raise anIndexError
. Consider adding a safety check or clarifying in the docstring that exactly two modalities (e.g., “image” and “mask”) are expected for this class.
201-221
: Crawler usage and log messages are cohesive.The
_crawl
method properly respects theupdate_crawl
flag, logs relevant details, and warns the user when indexing already exists. Great job on providing meaningful logs.
223-241
: Confirm the validity of modality inputs.The
parse_graph
method performs basic checks (str, list, enum), but it might be helpful to confirm validity of each modality token when the user passes a string or list. Invalid user input could produce unexpected partial matches.tests/legacy_tests/test_modalities.py (1)
21-21
: Direct usage ofread_dicom_auto
is fine.Since you are directly storing the return object in
img
, there's no immediate red flag here. Make sure that you consistently handle the returned object shape across all tests.config/ruff.toml (1)
14-14
: Great addition to linting coverage!Including
input_classes.py
in the linting process aligns with the project's goal of expanding linting coverage. This will help maintain code quality and consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/imgtools/ops/input_classes.py (5)
5-12
: Consider organizing imports into logical groups.For better readability, consider organizing imports into three groups:
- Standard library imports
- Third-party imports
- Local imports
-from collections import namedtuple -from enum import Enum -import pathlib -import time -from typing import Any, Generator, List, NamedTuple, Optional, Callable -import SimpleITK as sitk -from imgtools.modules import StructureSet, Segmentation, Scan -from dataclasses import dataclass, field -import pandas as pd +# Standard library imports +from collections import namedtuple +from dataclasses import dataclass, field +from enum import Enum +import pathlib +import time +from typing import Any, Generator, List, NamedTuple, Optional, Callable + +# Third-party imports +import pandas as pd +import SimpleITK as sitk + +# Local imports +from imgtools.modules import StructureSet, Segmentation, Scan
27-27
: Add docstring to the LoaderFunction type alias.Adding a docstring would help other developers understand the purpose and expected return types of the loader functions.
-LoaderFunction = Callable[..., sitk.Image | StructureSet | Segmentation] +# Type alias for functions that load medical images or structures +LoaderFunction = Callable[..., sitk.Image | StructureSet | Segmentation] +"""Type hint for functions that load medical images, structure sets, or segmentations."""
55-63
: Add documentation for each modality combination.Consider adding docstrings to explain each modality combination and its typical use case.
class ImageMaskModalities(Enum): + """Enum defining supported combinations of medical image modalities and their masks.""" + CT_RTSTRUCT = ("CT", "RTSTRUCT") + """CT scan with radiotherapy structure set.""" CT_SEG = ("CT", "SEG") + """CT scan with segmentation mask.""" MR_RTSTRUCT = ("MR", "RTSTRUCT") + """MR scan with radiotherapy structure set.""" MR_SEG = ("MR", "SEG") + """MR scan with segmentation mask."""
420-422
: Address TODO comment about utility classes.The TODO comment indicates these classes need work. Consider:
- Documenting specific improvements needed
- Creating separate issues to track the work
- Adding examples of intended usage
Would you like me to help create detailed issues for tracking the improvements needed for these utility classes?
522-538
: Enhance example usage with more comprehensive cases.Consider improving the example section by:
- Using pathlib's resolve() for robust path handling
- Adding comments explaining the expected output
- Including error handling examples
- testdata = pathlib.Path("data") + # Use resolved path to ensure proper path handling + testdata = pathlib.Path("data").resolve() + + # Example: Loading and basic error handling + try: + vs_seg = ImageMaskInput( + dir_path=testdata / datasets_name[1], + modalities=ImageMaskModalities.MR_RTSTRUCT + ) + # Example: Accessing the first case + first_case = vs_seg[0] + print(f"Successfully loaded case with shape: {first_case.scan.shape}") + except Exception as e: + print(f"Error loading dataset: {e}")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/imgtools/ops/input_classes.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`src/**/*.py`: Review the Python code for compliance with PE...
src/**/*.py
: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.
src/imgtools/ops/input_classes.py
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Unit-Tests (windows-latest, py313)
- GitHub Check: Unit-Tests (windows-latest, py312)
- GitHub Check: Unit-Tests (windows-latest, py311)
- GitHub Check: Unit-Tests (windows-latest, py310)
- GitHub Check: Unit-Tests (macos-13, py313)
- GitHub Check: Unit-Tests (macos-13, py312)
- GitHub Check: Unit-Tests (macos-13, py311)
- GitHub Check: Unit-Tests (macos-13, py310)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py313)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (1)
src/imgtools/ops/input_classes.py (1)
30-52
: Well-implemented timer decorator!The implementation is clean, well-documented, and follows best practices. The docstring clearly explains the purpose and usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/imgtools/ops/input_classes.py (2)
131-191
: 🛠️ Refactor suggestionConsider breaking down post_init into smaller methods.
The
__post_init__
method is handling multiple responsibilities. Breaking it down would improve maintainability.
299-301
:⚠️ Potential issueFix f-string formatting in error message.
The error message is not properly formatted using f-strings.
🧹 Nitpick comments (3)
src/imgtools/ops/input_classes.py (3)
30-55
: Consider adding specific return type hints for the wrapped function.The timer decorator is well-implemented, but could benefit from more specific type hints. Instead of
Any
, consider using TypeVar to preserve the original function's return type.-def timer(name: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]: +from typing import TypeVar, ParamSpec + +T = TypeVar('T') +P = ParamSpec('P') + +def timer(name: str) -> Callable[[Callable[P, T]], Callable[P, T]]:
73-81
: Add docstring to the Enum class.Consider adding a docstring to explain the purpose and usage of the
ImageMaskModalities
enum.class ImageMaskModalities(Enum): + """ + Enum representing valid combinations of image and mask modalities. + + Each enum value is a tuple of (image_modality, mask_modality) where: + - image_modality can be 'CT' or 'MR' + - mask_modality can be 'RTSTRUCT' or 'SEG' + """ CT_RTSTRUCT = ("CT", "RTSTRUCT")
541-544
: Consider using environment variables or configuration files for data paths.Hard-coded data paths in the main block could make the code less portable. Consider using environment variables or configuration files.
+from pathlib import Path +from os import getenv + if __name__ == "__main__": # pragma: no cover - testdata = pathlib.Path("data") + testdata = Path(getenv("IMGTOOLS_DATA_DIR", "data"))
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/imgtools/ops/input_classes.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`src/**/*.py`: Review the Python code for compliance with PE...
src/**/*.py
: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.
src/imgtools/ops/input_classes.py
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Unit-Tests (windows-latest, py313)
- GitHub Check: Unit-Tests (windows-latest, py312)
- GitHub Check: Unit-Tests (windows-latest, py311)
- GitHub Check: Unit-Tests (windows-latest, py310)
- GitHub Check: Unit-Tests (macos-13, py313)
- GitHub Check: Unit-Tests (macos-13, py312)
- GitHub Check: Unit-Tests (macos-13, py311)
- GitHub Check: Unit-Tests (macos-13, py310)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py313)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
- GitHub Check: Unit-Tests (ubuntu-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (2)
src/imgtools/ops/input_classes.py (2)
5-28
: Well-organized imports and type definitions!The imports are logically grouped, and the
LoaderFunction
type alias enhances code readability by providing a clear type hint for loader functions.
57-71
: Clean and well-documented NamedTuple implementation!The
ImageMask
NamedTuple is well-documented with clear type hints and purpose.
…and exclude deprecated files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/imgtools/ops/input_classes.py (1)
272-274
:⚠️ Potential issueFix f-string formatting in error message.
The error message is missing an f-string for the length value.
- errmsg = ( - f"Index {key} out of range Dataset has {len(self)} cases." - ) + errmsg = ( + f"Index {key} out of range. Dataset has {len(self)} cases." + )
🧹 Nitpick comments (4)
src/imgtools/ops/input_classes.py (4)
46-54
: Add docstring to ImageMaskModalities enum.While the implementation is good, adding a docstring would help users understand the available modality combinations and their use cases.
class ImageMaskModalities(Enum): + """Enum for valid combinations of image and mask modalities. + + Available combinations: + - CT_RTSTRUCT: CT scan with radiotherapy structure set + - CT_SEG: CT scan with segmentation + - MR_RTSTRUCT: MR scan with radiotherapy structure set + - MR_SEG: MR scan with segmentation + """ CT_RTSTRUCT = ("CT", "RTSTRUCT")
114-114
: Improve lambda function readability.The lambda function could be more readable with a proper function name.
- create_path = lambda f: self.dir_path.parent / self.imgtools_dir / f + def create_path(filename: str) -> pathlib.Path: + return self.dir_path.parent / self.imgtools_dir / filename
226-230
: Improve error message in modalities type check.The error message could be more descriptive and properly formatted.
- errmsg = ( - "Modalities must be a string or a" - "list of strings got {type(modalities)}" - ) + errmsg = ( + f"Modalities must be a string or a list of strings, " + f"got {type(modalities)}" + )
513-529
: Enhance example documentation.The examples would be more helpful with additional comments explaining the expected outcomes and use cases.
# Define the path to the data testdata = pathlib.Path("data") - # for this tutorial we will use some test image data + # Example usage with different datasets and modality combinations datasets_name = ["NSCLC-Radiomics", "Vestibular-Schwannoma-SEG"] + + # Load MR images with RTSTRUCT masks from Vestibular-Schwannoma dataset vs_seg = ImageMaskInput( dir_path=testdata / datasets_name[1], modalities=ImageMaskModalities.MR_RTSTRUCT, ) + # Load CT images with RTSTRUCT masks from NSCLC dataset nsclsc_rtstruct = ImageMaskInput( dir_path=testdata / datasets_name[0], modalities=ImageMaskModalities.CT_RTSTRUCT, ) + + # Load CT images with SEG masks from NSCLC dataset nsclsc_seg = ImageMaskInput( dir_path=testdata / datasets_name[0], modalities=ImageMaskModalities.CT_SEG, )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
config/ruff.toml
(1 hunks)src/imgtools/ops/__init__.py
(2 hunks)src/imgtools/ops/input_classes.py
(4 hunks)src/imgtools/ops/ops.py
(3 hunks)src/imgtools/utils/timer.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/imgtools/ops/ops.py
- config/ruff.toml
🧰 Additional context used
📓 Path-based instructions (1)
`src/**/*.py`: Review the Python code for compliance with PE...
src/**/*.py
: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.
src/imgtools/utils/timer.py
src/imgtools/ops/__init__.py
src/imgtools/ops/input_classes.py
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Unit-Tests (windows-latest, py313)
- GitHub Check: Unit-Tests (windows-latest, py312)
- GitHub Check: Unit-Tests (windows-latest, py311)
- GitHub Check: Unit-Tests (windows-latest, py310)
- GitHub Check: Unit-Tests (macos-13, py313)
- GitHub Check: Unit-Tests (macos-13, py312)
- GitHub Check: Unit-Tests (macos-13, py311)
- GitHub Check: Unit-Tests (macos-13, py310)
- GitHub Check: Unit-Tests (macos-latest, py311)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
🔇 Additional comments (3)
src/imgtools/utils/timer.py (1)
7-41
: Well-structured timer decorator implementation!The implementation follows best practices with proper type hints, clear documentation, and accurate time measurement.
src/imgtools/ops/__init__.py (1)
1-76
: Well-organized module interface!The imports are properly categorized and the all list clearly defines the public API.
src/imgtools/ops/input_classes.py (1)
30-44
: Well-documented data structure!The ImageMask NamedTuple provides a clear and type-safe way to store image-mask pairs.
5. create output streams | ||
""" | ||
self.dataset_name = self.dir_path.name | ||
create_path = lambda f: self.dir_path.parent / self.imgtools_dir / f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna steal this lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice
raise ValueError(errmsg) from e | ||
parsed_cols = self.parsed_df.columns.tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NITPICK: Add a space for readibility
raise ValueError(errmsg) from e | |
parsed_cols = self.parsed_df.columns.tolist() | |
raise ValueError(errmsg) from e | |
parsed_cols = self.parsed_df.columns.tolist() |
Work In Progress
mostly meets the requirements of Med-ImageNet, but could be useful as a public API for anyone
Prototype
ImageMaskInput
Important
by design,
RTSTRUCT
s are converted toimgtools.Segmentation
returned with all ROIswhich matches the default design of
SEG
objects, and will allow for uniform filtering ofROINames
on one data type in the futureimgtools.Scan
)imgtools.Segmentation
(converted fromimgtools.StructureSet
with all ROIs)imgtools.Segmentation
(converted fromimgtools.StructureSet
with all ROIs)imgtools.Segmentation
imgtools.Segmentation
Key Features:
Automatically constructs a DataGraph and querying of specific modalities (e.g., CT, MR, RTSTRUCT, SEG).
ImageMaskModalities
EnumSummary
parsing, indexing, and querying datasets.
loader setup.
Tutorial: Loading Images and Masks from DICOMs
Getting Started
You need the following at minimum:
Combinations are:
For this tutorial we will download two datasets:
Setting up the loaders with
ImageMaskInput
Vestibular-Schwannoma-SEG
has MR as scan and RTSTRUCT as mask:
NSCLC-Radiomics
has CT as scan and BOTH RTSTRUCT and SEG as masks.
CT & RTSTRUCT
CT & SEG
Using the Input Datasets
Load a case
StructureSets are automatically converted to
Segmentation
object with all ROIsWhereas native
SEG
modalities get loaded asSegmentation
objectsMR & RTSTRUCT example
mask
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests