Skip to content

Complex domain detection for collections of data items with CLI support #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b3dcf39
Add DataCollectionDomainDetector.
robertbartel Apr 2, 2024
2013c38
Remove commented-out collection detector impl.
robertbartel Apr 2, 2024
f590ee8
Add automated domain detection to DmodClient.
robertbartel Apr 2, 2024
3a54850
Add CLI args to just detect domain for local data.
robertbartel Apr 2, 2024
be2ad7c
Update restriction types to check/perform extend.
robertbartel Apr 2, 2024
3f7d534
Add class method for merging DataDomain instances.
robertbartel Apr 2, 2024
40f4e50
Add __eq__ impl for DataDomain.
robertbartel Apr 2, 2024
4fc3bc3
Update python/lib/client/dmod/client/__main__.py
robertbartel Apr 4, 2024
b206fe8
Combining like exception handling in client main.
robertbartel Apr 4, 2024
f37d90e
Fix Dataset __hash__ impl.
robertbartel Apr 5, 2024
f387c70
Use future annotations in meta_data.py.
robertbartel Apr 5, 2024
dc90b3f
Rename restriction methods from extend to expand.
robertbartel Apr 5, 2024
1bdcb37
Refactor DataCollectionDomainDetector.detect().
robertbartel Apr 5, 2024
ccc41b1
Fix collection domain detector get_item_names.
robertbartel Apr 9, 2024
3978ab0
Move import/reg of domain detectors in client.
robertbartel Apr 9, 2024
6bf1581
Add domain detectory registry func for all names.
robertbartel Apr 9, 2024
2f61fb5
Refactor client __main__ for domain detection.
robertbartel Apr 9, 2024
1d90801
Add dedicated core module for domain detectors.
robertbartel Apr 9, 2024
e6f0d23
Move core domain detector classes to new module.
robertbartel Apr 9, 2024
319a0c1
Update client init for new domain detector module.
robertbartel Apr 9, 2024
4e459d7
Update client main for new domain detector module.
robertbartel Apr 9, 2024
95f1135
Update dmod_client for new domain detector module.
robertbartel Apr 9, 2024
ed4910f
Update subclasses for new domain detector module.
robertbartel Apr 9, 2024
5c36a5e
Update AORC detector tests new superclass module.
robertbartel Apr 9, 2024
2f4b18d
Update gpkg detector tests new superclass module.
robertbartel Apr 9, 2024
7a2f164
Update universal tests for new superclass module.
robertbartel Apr 9, 2024
c02b2ca
Refactor DataDomain.merge_domains for readability.
robertbartel Apr 15, 2024
8bff0b0
Simplify domain detector registry impl.
robertbartel Apr 15, 2024
ef6c08c
Simplify __init__ for ItemDataDomainDetector.
robertbartel Apr 15, 2024
4e665d4
Use better exception type for domain merge fail.
robertbartel Apr 16, 2024
b2479a3
Make ngen-config dependency of modeldata package.
robertbartel Apr 17, 2024
da0783d
No conditional RealizationConfigDomainDetector.
robertbartel Apr 17, 2024
137e505
Re-implement universal detector as abstraction.
robertbartel Apr 17, 2024
262c337
Add concrete domain detector impls in client pkg.
robertbartel Apr 17, 2024
56fdd91
Using new client package domain detectors.
robertbartel Apr 17, 2024
9aa0894
Refactor universal detect multi result exception.
robertbartel Apr 19, 2024
5d4c993
Add debug log for client extracting data domain.
robertbartel Apr 19, 2024
a50baff
Explicit init params in domain detector subclass.
robertbartel Apr 26, 2024
33bc0bf
Fix regex in AorcCsvFileDomainDetector func.
robertbartel Apr 26, 2024
b9c1a0f
Fix init in AorcCsvFileDomainDetector func.
robertbartel Apr 26, 2024
eb92617
Update/fix fields/columns in AORC forcing format.
robertbartel Apr 26, 2024
aaa51e3
Sanity check fields in AORC forcing domain detect.
robertbartel Apr 26, 2024
aea0e13
Remove dead code from item_domain_detector.py.
robertbartel Apr 26, 2024
aa22c45
Fix regions (somewhat) for gpkg domain detect.
robertbartel Apr 26, 2024
68fb002
Fix gpkg domain detector bug and test case.
robertbartel Apr 29, 2024
c9fbb29
Make dmod.client depend on dmod.modeldata.
robertbartel Apr 19, 2024
c15419f
Remove unneeded item domain detector instance var.
robertbartel May 9, 2024
1327f83
Force keyword arg for ItemDataDomainDetector init.
robertbartel May 9, 2024
5ca9548
Improve __init__ signature for universal detector.
robertbartel May 9, 2024
825c9cb
Add debug logging to universal _try_detection().
robertbartel May 9, 2024
db50a01
Improve universal detect() kwarg setup/docstring.
robertbartel May 9, 2024
c43529e
Force more detector types to use kwargs for init.
robertbartel May 9, 2024
47a3995
Minor __init__ style tweak.
robertbartel May 9, 2024
e5f4d13
Imply in another detect() when kwargs aren't used.
robertbartel May 9, 2024
6def44d
Imply in yet another detect() kwargs aren't used.
robertbartel May 10, 2024
6f4dd60
Use BytesIO in AORC domain detector w/ bytes data.
robertbartel May 10, 2024
94b6766
Bump core version for missing module.
robertbartel May 13, 2024
1de2c0e
Avoid call __hash__ directly in DataDomain __eq__.
robertbartel May 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/lib/client/dmod/client/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
name = 'client'
name = 'client'
64 changes: 47 additions & 17 deletions python/lib/client/dmod/client/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
import datetime
import json
from dmod.core.execution import AllocationParadigm
from dmod.core.exception import DmodRuntimeError
from . import name as package_name
from .dmod_client import ClientConfig, DmodClient
from .dmod_client import ClientConfig, DmodClient, run_domain_detection
from .domain_detectors import ClientUniversalItemDomainDetector
from dmod.communication.client import get_or_create_eventloop
from dmod.core.meta_data import ContinuousRestriction, DataCategory, DataDomain, DataFormat, DiscreteRestriction, \
TimeRange
from dmod.core.meta_data import (ContinuousRestriction, DataCategory, DataDomain, DataFormat, DiscreteRestriction,
TimeRange)
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Type
from typing import Any, List, Optional


DEFAULT_CLIENT_CONFIG_BASENAME = '.dmod_client_config.json'
Expand Down Expand Up @@ -230,6 +232,17 @@ def _handle_data_service_action_args(parent_subparsers_container):
parser_delete = action_subparsers.add_parser('delete', description="Delete a specified (entire) dataset.")
parser_delete.add_argument('name', help='Specify the name of the dataset to delete.')

# Nested parser for the 'domain' action, with required argument for path to the data to detect over
parser_domain = action_subparsers.add_parser('domain', description="Ops related to DataDomains and detection.")
domain_command_subparsers = parser_domain.add_subparsers(dest="domain_command")
detect_domain_parser = domain_command_subparsers.add_parser('detect',
description="Detect DataDomain for local data for a dataset.")
detect_domain_parser.add_argument('path', type=Path,
help="Specify a data file or path containing several data files.")

show_detectors = domain_command_subparsers.add_parser('list_detectors',
description="List the domain detector subclasses that are available.")

# Nested parser for the 'upload' action, with required args for dataset name and files to upload
parser_upload = action_subparsers.add_parser('upload', description="Upload local files to a dataset.")
parser_upload.add_argument('--data-root', dest='data_root', type=Path,
Expand Down Expand Up @@ -322,6 +335,23 @@ def _handle_args():
return parser.parse_args()


def _run_domain_command(args):
if args.domain_command == 'detect':
try:
domain = run_domain_detection(paths=args.path)
print({"success": True, "domain": f"{domain.to_json()}"})
except DmodRuntimeError as e:
print({"success": False, "reason": f"{e.__class__.__name__}", "message": f"{e!s}"})
except Exception as e:
print(f"ERROR - Encountered {e.__class__.__name__} detecting domain: {e!s}")
exit(1)
elif args.domain_command == "list_detectors":
all_names = [d.__name__ for d in ClientUniversalItemDomainDetector.get_default_detectors()]
print({"success": True, "detector_names": all_names})
else:
raise NotImplementedError(f"Unrecognized domain command '{args.domain_command!s}'")


def find_client_config(basenames: Optional[List[str]] = None, dirs: Optional[List[Path]] = None) -> Optional[Path]:
"""
Search locations for the client config of given basenames, falling back to defaults, and returning path if found.
Expand Down Expand Up @@ -354,19 +384,19 @@ def find_client_config(basenames: Optional[List[str]] = None, dirs: Optional[Lis


def execute_dataset_command(args, client: DmodClient):
async_loop = get_or_create_eventloop()
try:
result = async_loop.run_until_complete(client.data_service_action(**(vars(args))))
print(result)
except ValueError as e:
print(str(e))
sys.exit(1)
except NotImplementedError as e:
print(str(e))
sys.exit(1)
except Exception as e:
print("ERROR: Encountered {} - {}".format(e.__class__.__name__, str(e)))
sys.exit(1)
if args.action == 'domain':
_run_domain_command(args)
else:
async_loop = get_or_create_eventloop()
try:
result = async_loop.run_until_complete(client.data_service_action(**(vars(args))))
print(result)
except (ValueError, NotImplementedError) as e:
print(str(e))
sys.exit(1)
except Exception as e:
print("ERROR: Encountered {} - {}".format(e.__class__.__name__, str(e)))
sys.exit(1)


def execute_config_command(parsed_args, client: DmodClient):
Expand Down
2 changes: 1 addition & 1 deletion python/lib/client/dmod/client/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.6.0'
__version__ = '0.6.1'
61 changes: 54 additions & 7 deletions python/lib/client/dmod/client/dmod_client.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
import json
import logging

from dmod.communication import AuthClient, TransportLayerClient, WebSocketClient
from dmod.core.common import get_subclasses
from dmod.core.exception import DmodRuntimeError
from dmod.core.serializable import BasicResultIndicator, ResultIndicator
from dmod.core.meta_data import DataDomain
from dmod.core.meta_data import DataDomain, DiscreteRestriction, StandardDatasetIndex
from .request_clients import DataServiceClient, JobClient
from .client_config import ClientConfig
from pathlib import Path
from typing import Type
from typing import List, Optional, Type, Union

from functools import reduce
from .domain_detectors import ClientDataCollectionDomainDetector, ClientUniversalItemDomainDetector


def determine_transport_client_type(protocol: str,
Expand Down Expand Up @@ -48,6 +53,45 @@ def _get_subclasses(class_val):
raise RuntimeError(f"No subclass of `{TransportLayerClient.__name__}` found supporting protocol '{protocol}'")


def run_domain_detection(paths: Union[Path, List[Path]], data_id: Optional[str] = None) -> DataDomain:
"""
Run domain detection.

Parameters
----------
paths: Union[Path, List[Path]]
One or more paths to files or directories.
data_id: Optional[str]
Hypothetical data_id for a dataset containing this data and having the returned domain, useful in situations
where data_id is itself an index of the domain against which constraints are compared.

Returns
-------
DataDomain
The detected domain.

Raises
------
DmodRuntimeError
If detection is unsuccessful.
"""
def _detect(p: Path):
if p.is_dir():
return ClientDataCollectionDomainDetector(data_collection=p, collection_name=data_id).detect()
else:
return ClientUniversalItemDomainDetector(item=p).detect()

if isinstance(paths, Path):
return _detect(paths)
else:
domain = reduce(DataDomain.merge_domains, (_detect(path) for path in paths))
# It's possible that for many different individual file paths, data_id won't get set, so ...
idx = StandardDatasetIndex.DATA_ID
if data_id is not None and idx in domain.data_format.indices_to_fields().keys() and idx not in domain.discrete_restrictions:
domain.discrete_restrictions[idx] = DiscreteRestriction(variable=idx, values=[data_id])
return domain


class DmodClient:

def __init__(self, client_config: ClientConfig, bypass_request_service: bool = False, *args, **kwargs):
Expand Down Expand Up @@ -115,11 +159,12 @@ def _extract_dataset_domain(**kwargs) -> DataDomain:
raise ValueError(f"Could not deserialize JSON in 'domain_file' `{domain_file!s}` to domain object")
else:
return domain
else:
try:
return DataDomain(**kwargs)
except Exception as e:
raise RuntimeError(f"Could not inflate keyword params to object due to {e.__class__.__name__} - {e!s}")
try:
return DataDomain(**kwargs)
except Exception:
logging.debug(f"No {DataDomain.__name__} provided; trying auto detection (params were {kwargs!s}")
return run_domain_detection(paths=kwargs.get('upload_paths'), data_id=kwargs.get('name'))


def _get_transport_client(self, **kwargs) -> TransportLayerClient:
# TODO: later add support for multiplexing capabilities and spawning wrapper clients
Expand Down Expand Up @@ -152,6 +197,8 @@ async def data_service_action(self, action: str, **kwargs) -> ResultIndicator:
except TypeError as e:
return BasicResultIndicator(success=False, reason="No Dataset Domain Provided",
message=f"Invalid type provided for 'domain' param: {e!s} ")
except DmodRuntimeError as e:
return BasicResultIndicator(success=False, reason="Domain Detection Failed", message=f"{e!s}")
except (ValueError, RuntimeError) as e:
return BasicResultIndicator(success=False, reason="No Dataset Domain Provided", message=f"{e!s}")
return await self.data_service_client.create_dataset(domain=domain, **kwargs)
Expand Down
76 changes: 76 additions & 0 deletions python/lib/client/dmod/client/domain_detectors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
from dmod.core.data_domain_detectors import (AbstractUniversalItemDomainDetector, AbstractDataCollectionDomainDetector,
DataItem, ItemDataDomainDetector)
from dmod.modeldata.data.item_domain_detector import (AorcCsvFileDomainDetector, GeoPackageHydrofabricDomainDetector,
RealizationConfigDomainDetector)
from typing import Any, Callable, Dict, List, Optional, Set, Type


class ClientUniversalItemDomainDetector(AbstractUniversalItemDomainDetector):
"""
Concrete implementation of :class:`AbstractUniversalItemDomainDetector`, with some default detector subclass types.


"""
_default_detector_types: Set[Type[ItemDataDomainDetector]] = {
AorcCsvFileDomainDetector,
GeoPackageHydrofabricDomainDetector,
RealizationConfigDomainDetector
}
""" Default detector subclasses always associated with instances of this type. """

@classmethod
def get_default_detectors(cls) -> List[Type[ItemDataDomainDetector]]:
return [d for d in cls._default_detector_types]

def __init__(self,
item: DataItem,
item_name: Optional[str] = None,
decode_format: str = 'utf-8',
short_on_success: bool = False,
type_sort_func: Optional[Callable[[Type[ItemDataDomainDetector]], Any]] = None):
"""
Initialize an instance.

Parameters
----------
item: DataItem
The data item for which a domain will be detected.
item_name: Optional[str]
The name for the item, which includes important domain metadata in some situations.
decode_format: str
The decoder format when decoding byte strings (``utf-8`` by default).
short_on_success: Optional[bool]
Indication of whether :method:`detect` should short circuit and return the 1st successful detection, rather
than try all subclasses and risk multiple detections, and thus an error condition (default: ``False``).
type_sort_func: Optional[Callable[[Type[ItemDataDomainDetector]], Any]]
Optional function necessary for calls to usage of the built-in ``sorted`` function to sort detector
subclasses during various instance operations, and serving as the ``key`` argument to ``sorted``; note that
sorting is performed in such places IFF this is validly set, as the subclass themselves - i.e., the
:class:`type` objects - do not implement `<`.
"""
super().__init__(item=item,
item_name=item_name,
decode_format=decode_format,
detector_types=self._default_detector_types,
short_on_success=short_on_success,
type_sort_func=type_sort_func)


class ClientDataCollectionDomainDetector(AbstractDataCollectionDomainDetector[ClientUniversalItemDomainDetector]):
"""
Concrete implementation relying on :class:`UniversalItemDomainDetector` to detect items.
"""

def get_item_detectors(self) -> Dict[str, ClientUniversalItemDomainDetector]:
"""
Get initialized detection objects, keyed by item names, for items within this instance's data collection.

Returns
-------
Dict[str, ClientUniversalItemDomainDetector]
Dictionary of per-item initialize detection objects, keyed by item name.
"""
detectors = dict()
for item_name in self.get_item_names():
detectors[item_name] = ClientUniversalItemDomainDetector(item=self.get_item(item_name), item_name=item_name)
return detectors
4 changes: 2 additions & 2 deletions python/lib/client/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
license='',
include_package_data=True,
#install_requires=['websockets', 'jsonschema'],vi
install_requires=['dmod-core>=0.15.0', 'websockets>=8.1', 'pydantic>=1.10.8,~=1.10', 'dmod-communication>=0.17.0',
'dmod-externalrequests>=0.6.0'],
install_requires=['dmod-core>=0.15.2', 'websockets>=8.1', 'pydantic>=1.10.8,~=1.10', 'dmod-communication>=0.17.0',
'dmod-externalrequests>=0.6.0', 'dmod-modeldata>=0.11.1'],
packages=find_namespace_packages(include=['dmod.*'], exclude=['dmod.test'])
)
2 changes: 1 addition & 1 deletion python/lib/core/dmod/core/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.15.1'
__version__ = '0.15.2'
Loading
Loading