Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clinton/ed 448/sdk html and text #820

Merged
merged 36 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8047030
Bump version number
clinton-encord Jun 19, 2024
53fa0d8
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jun 21, 2024
6debbe8
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jul 15, 2024
d726863
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jul 17, 2024
4830734
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jul 26, 2024
a9fc54f
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jul 30, 2024
e0ba8d1
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Aug 2, 2024
8690215
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Aug 12, 2024
b3047d2
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Aug 12, 2024
c5d32c4
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Aug 15, 2024
b27a0f0
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Sep 10, 2024
c7826e7
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Oct 3, 2024
3a9aa3a
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Oct 25, 2024
e86bdc0
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 1, 2024
100ed74
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 4, 2024
913f4cc
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 5, 2024
a46e751
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 13, 2024
362f166
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 13, 2024
510b9aa
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 13, 2024
973bc1a
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 13, 2024
3143c2b
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 18, 2024
c349686
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 20, 2024
93c68e2
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Nov 27, 2024
97583a2
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Dec 4, 2024
ccfeffd
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Dec 11, 2024
93e3754
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Dec 11, 2024
a358161
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Dec 31, 2024
3a10a64
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jan 6, 2025
dfd120a
Merge branch 'master' of github.com:encord-team/encord-client-python
clinton-encord Jan 13, 2025
d692729
SDK and HTML PR
clinton-encord Dec 6, 2024
dd233f8
- Use HtmlCoordinates and TextCoordinates to hold location of labels …
clinton-encord Dec 30, 2024
4add650
Added test for serialising plain text labels
clinton-encord Jan 6, 2025
8550067
Update comment
clinton-encord Jan 7, 2025
1511203
Use range=[] for non-geometric data
clinton-encord Jan 7, 2025
ea71694
Fix ruff formatting
clinton-encord Jan 7, 2025
32348fb
Fix ruff formatting
clinton-encord Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions encord/common/range_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ def remove_ranges(self, ranges_to_remove: Ranges) -> None:
for r in ranges_to_remove:
self.remove_range(r)

def clear_ranges(self) -> None:
"""Clear all ranges."""
self.ranges = []

def get_ranges(self) -> Ranges:
"""Return the sorted list of merged ranges."""
copied_ranges = [range.copy() for range in self.ranges]
Expand Down
14 changes: 14 additions & 0 deletions encord/constants/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,17 @@ def from_upper_case_string(string: str) -> DataType:

def to_upper_case_string(self) -> str:
return self.value.upper()


GEOMETRIC_TYPES = {
DataType.VIDEO,
DataType.IMAGE,
DataType.IMG_GROUP,
DataType.DICOM,
DataType.DICOM_STUDY,
DataType.NIFTI,
}


def is_geometric(data_type: DataType) -> bool:
return data_type in GEOMETRIC_TYPES
35 changes: 25 additions & 10 deletions encord/objects/classification_instance.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

from encord.common.range_manager import RangeManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to expect label_row.add_classification_instance(cls_instance) to work for text/html/audio without having to call cls_instance.set_for_frames, given that we only support global classifications for these?

from encord.common.time_parser import parse_datetime
from encord.constants.enums import DataType
from encord.constants.enums import DataType, is_geometric
from encord.exceptions import LabelRowError
from encord.objects.answers import Answer, ValueType, _get_static_answer_map
from encord.objects.attributes import (
Expand All @@ -55,6 +55,17 @@
from encord.objects import LabelRowV2


# For Audio and Text files, classifications can only be applied to Range(start=0, end=0)
# Because we treat the entire file as being on one frame (for classifications, its different for objects)
def _verify_non_geometric_classifications_range(ranges_to_add: Ranges, label_row: Optional[LabelRowV2]) -> None:
is_range_only_on_frame_0 = len(ranges_to_add) == 1 and ranges_to_add[0].start == 0 and ranges_to_add[0].end == 0
if label_row is not None and not is_geometric(label_row.data_type) and not is_range_only_on_frame_0:
raise LabelRowError(
"For audio files and text files, classifications can only be attached to frame=0 "
"You may use `ClassificationInstance.set_for_frames(frames=Range(start=0, end=0))`."
)


class ClassificationInstance:
def __init__(
self,
Expand Down Expand Up @@ -104,6 +115,9 @@ def feature_hash(self) -> str:
def _last_frame(self) -> Union[int, float]:
if self._parent is None or self._parent.data_type is DataType.DICOM:
return float("inf")
elif self._parent is not None and not is_geometric(self._parent.data_type):
# For audio and text files, the entire file is treated as one frame
return 1
else:
return self._parent.number_of_frames

Expand Down Expand Up @@ -139,22 +153,21 @@ def _set_for_ranges(
reviews: Optional[List[dict]],
):
new_range_manager = RangeManager(frame_class=frames)
conflicting_ranges = self._is_classification_already_present_on_range(new_range_manager.get_ranges())
ranges_to_add = new_range_manager.get_ranges()

_verify_non_geometric_classifications_range(ranges_to_add, self._parent)

conflicting_ranges = self._is_classification_already_present_on_range(ranges_to_add)
if conflicting_ranges and not overwrite:
raise LabelRowError(
f"The classification '{self.classification_hash}' already exists "
f"on the ranges {conflicting_ranges}. "
f"Set 'overwrite' parameter to True to override."
)

ranges_to_add = new_range_manager.get_ranges()
for range_to_add in ranges_to_add:
self._check_within_range(range_to_add.end)

"""
At this point, this classification instance operates on ranges, NOT on frames.
We therefore leave only FRAME 0 in the map.The frame_data for FRAME 0 will be
treated as the data for all "frames" in this classification instance.
For non-geometric files, the frame_data for FRAME 0 will be
treated as the data for the entire classification instance.
"""
self._set_frame_and_frame_data(
frame=0,
Expand Down Expand Up @@ -685,7 +698,9 @@ def _is_selectable_child_attribute(self, attribute: Attribute) -> bool:
def _check_within_range(self, frame: int) -> None:
if frame < 0 or frame >= self._last_frame:
raise LabelRowError(
f"The supplied frame of `{frame}` is not within the acceptable bounds of `0` to `{self._last_frame}`."
f"The supplied frame of `{frame}` is not within the acceptable bounds of `0` to `{self._last_frame}`. "
f"Note: for non-geometric data (e.g. {DataType.AUDIO} and {DataType.PLAIN_TEXT}), "
f"the entire file has only 1 frame."
)

def _is_classification_already_present(self, frames: Iterable[int]) -> Set[int]:
Expand Down
1 change: 1 addition & 0 deletions encord/objects/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ class Shape(StringEnum):
ROTATABLE_BOUNDING_BOX = "rotatable_bounding_box"
BITMASK = "bitmask"
AUDIO = "audio"
TEXT = "text"


class DeidentifyRedactTextMode(Enum):
Expand Down
63 changes: 53 additions & 10 deletions encord/objects/coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
from encord.exceptions import LabelRowError
from encord.objects.bitmask import BitmaskCoordinates
from encord.objects.common import Shape
from encord.objects.frames import Ranges
from encord.objects.html_node import HtmlRange
from encord.orm.analytics import CamelStrEnum
from encord.orm.base_dto import BaseDTO

Expand Down Expand Up @@ -339,11 +341,50 @@ def to_dict(self, by_alias=True, exclude_none=True) -> Dict[str, Any]:


class AudioCoordinates(BaseDTO):
pass
"""
Represents coordinates for an audio file

Attributes:
range (Ranges): Ranges in milliseconds for audio files
"""

range: Ranges

def __post_init__(self):
if len(self.range) == 0:
raise ValueError("Range list must contain at least one range.")


class TextCoordinates(BaseDTO):
clinton-encord marked this conversation as resolved.
Show resolved Hide resolved
"""
Represents coordinates for a text file

Attributes:
range (Ranges): Ranges of chars for simple text files
"""

range: Ranges


class HtmlCoordinates(BaseDTO):
"""
Represents coordinates for a html file

Attributes:
range_html (List[HtmlRange]): A list of HtmlRange objects
"""

range: List[HtmlRange]


NON_GEOMETRIC_COORDINATES = {AudioCoordinates, TextCoordinates, HtmlCoordinates}


Coordinates = Union[
AudioCoordinates,
TextCoordinates,
Union[HtmlCoordinates, TextCoordinates],
HtmlCoordinates,
BoundingBoxCoordinates,
RotatableBoundingBoxCoordinates,
PointCoordinate,
Expand All @@ -352,13 +393,15 @@ class AudioCoordinates(BaseDTO):
SkeletonCoordinates,
BitmaskCoordinates,
]
ACCEPTABLE_COORDINATES_FOR_ONTOLOGY_ITEMS: Dict[Shape, Type[Coordinates]] = {
Shape.BOUNDING_BOX: BoundingBoxCoordinates,
Shape.ROTATABLE_BOUNDING_BOX: RotatableBoundingBoxCoordinates,
Shape.POINT: PointCoordinate,
Shape.POLYGON: PolygonCoordinates,
Shape.POLYLINE: PolylineCoordinates,
Shape.SKELETON: SkeletonCoordinates,
Shape.BITMASK: BitmaskCoordinates,
Shape.AUDIO: AudioCoordinates,

ACCEPTABLE_COORDINATES_FOR_ONTOLOGY_ITEMS: Dict[Shape, List[Type[Coordinates]]] = {
Shape.BOUNDING_BOX: [BoundingBoxCoordinates],
Shape.ROTATABLE_BOUNDING_BOX: [RotatableBoundingBoxCoordinates],
Shape.POINT: [PointCoordinate],
Shape.POLYGON: [PolygonCoordinates],
Shape.POLYLINE: [PolylineCoordinates],
Shape.SKELETON: [SkeletonCoordinates],
Shape.BITMASK: [BitmaskCoordinates],
Shape.AUDIO: [AudioCoordinates],
Shape.TEXT: [TextCoordinates, HtmlCoordinates],
}
69 changes: 69 additions & 0 deletions encord/objects/html_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
"""
---
title: "Objects - HTML Node"
slug: "sdk-ref-objects-html-node"
hidden: false
metadata:
title: "Objects - HTML Node"
description: "Encord SDK Objects - HTML Node."
category: "64e481b57b6027003f20aaa0"
---
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import Collection, List, Union, cast

from encord.orm.base_dto import BaseDTO


class HtmlNode(BaseDTO):
"""
A class representing a single HTML node, with the node and offset.

Attributes:
node (str): The xpath of the node
offset (int): The offset of the content from the xpath
"""

node: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should call this xpath? ATM node is already in the class name HtmlNode, so feels like the field name can be more descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the actual range_html in the objects_index, we call it "node" though.

Worried it might be confusing to have it called "node" in the exports and in our DB, but then have it called "xpath" here in this one place.

offset: int

def __repr__(self):
return f"(Node: {self.node} Offset: {self.offset})"


class HtmlRange(BaseDTO):
"""
A class representing a section of HTML with a start and end node.

Attributes:
start (HtmlNode): The starting node of the range.
end (HtmlNode): The ending node of the range.
"""

start: HtmlNode
end: HtmlNode

def __repr__(self):
return f"({self.start} - {self.end})"

def to_dict(self):
return {
"start": {"node": self.start.node, "offset": self.start.offset},
"end": {"node": self.end.node, "offset": self.end.offset},
}

def __hash__(self):
return f"{self.start.node}-{self.start.offset}-{self.end.node}-{self.end.offset}"

@classmethod
def from_dict(cls, d: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use this method to be able to construct TextCoordinates.range_html from a dict rather than having to import HtmlRange & HtmlNode?
No big deal though, it's fine as is

return HtmlRange(
start=HtmlNode(node=d["start"]["node"], offset=d["start"]["offset"]),
end=HtmlNode(node=d["end"]["node"], offset=d["end"]["offset"]),
)


HtmlRanges = List[HtmlRange]
Loading
Loading