generated from Hochfrequenz/python_template_repository
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
extract conditions and packages (#257)
* updated platformdirs 3.10.0 -> 4.1.0 in requirements * updated packaging 22.0 -> 23.2 in requirements * pip-compile requirements * updated know pruefis * added conditions flavour * moved conditions parsing to flavour conditions * 🩹corrected loop in process_pruefi * ➕added package table and functions to extract conditions * removed unused arguments in process_package_conditions * ➕added tests for packagetable, linting issue * linting/typechecking * ➕added edifactformat -> file mapping * move test docx files * Add new test files into new folder structure * ✅ Update tests to new test data folder structure * ✅ Add unit test to compare csv export files * 📝 Add information how much pages one format has * 🎨 Improve handling of output-path * ✅ fix test * ✅ Add test for change history * 🎨 Move changehistory functions into an extra module * 🎨 Fix imports * 🎨 Use function to check for change history section * 🎨 split commands into separate modules * 🚛 move dump conditions function * ✅ harmonize cli tests * 🚛 move pruefi command * 🚛 rename changehistory.function to __init__.py * 🎨 clean up imports * 🎨 clean up arguments for pruefi command * 🎨 add enum for ahbexportfileformat * ✅ fix tests * 🚛 move changehistory command * ✅ test the cli command pruefi direct * 🚛 move cli test for changehistory in extra file * 🔥 remove unused imports * 🚛 rename cli pruefi module * 📝 add documentation * ➕➖ replace attrs with pydantic and use pyproject.toml for dependencies * 🔄 Replace attrs with pydantic * 🔥 remove unused code * 🎨 use enum for file exports * 🎨 Use the click validation for filetype and make it required * 🎨 use functions to check for paragraph and table kinds * 🎨 Use ConfigDict to remove DeprecationWarning * ➕ add freezegun to mock datetime.now() in the tests * 🎨 use timezone.utc instead of UTC * 🚨remove unused imports * 🚧 WIP of the get_ahb_table rework * 🎉🚧 Finally COMDIS is there this commit fixes the issue that all pruefis which are above the change history section got not exportet * 🚨remove unused import * ✅ Improve the test to check the current state of the cli tool * ✅ clean up tests * ✅ Use sort instead of sorted * 🎨 Further improvements of the get_ahb_table function * 🎨 remove warning after tests src/kohlrahbi/pruefis/__init__.py:111 /Users/kevin/workspaces/hochfrequenz/kohlrahbi/src/kohlrahbi/pruefis/__init__.py:111: SyntaxWarning: invalid escape sequence '\d' """ -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html * ➕ Add pydantic to pyproject.toml * 🚨 Fix linter warnings * fixed tests * unittests changed path formats * linting * 🚨 fix further linter warnings * ✅ add test file to test cli conditions command * WIP * WIP * restructured and cleaned ahbconditions, read_functions and packagetable * 🩹 add missing function after merge * 🩹 fix imports * 🩹fix more imports * 🚧 WIP * refactoring conditions and packagetables * WIP * WIP2 * added test for conditions/__init__.py * WIP testing * added more tests for read_functions * 🩹linting/type_check * removed expected json from .gitignore * Added test, removed unused function * added even more tests * updated readme * automatically remove testfiles * solved interference of test_outputs * changed default output path for conditions to unify all subroutines * updated readme: --input-path -> --edi-energy-mirror-path/-eemp * added missing time freeze * Update src/kohlrahbi/ahbtable/ahbcondtions.py Co-authored-by: kevin <[email protected]> * Update src/kohlrahbi/ahbtable/ahbcondtions.py Co-authored-by: kevin <[email protected]> * Update src/kohlrahbi/ahbtable/ahbcondtions.py Co-authored-by: kevin <[email protected]> * simplified if statements * Removed condition dict extraction from unfolded ahb table as it is not used * reorganized duplicate code * moved function to parse conditions text due to circular import * fixed minor issue * Update src/kohlrahbi/ahbtable/ahbpackagetable.py Co-authored-by: kevin <[email protected]> * Update src/kohlrahbi/ahbtable/ahbpackagetable.py Co-authored-by: kevin <[email protected]> * simplified unnecessary line * Update src/kohlrahbi/ahbtable/ahbpackagetable.py Co-authored-by: kevin <[email protected]> * reduced call of get_format_of_pruefidentifikator fct. * updated doc strings and added assume-yes flag to conditions command * Update src/kohlrahbi/read_functions.py Co-authored-by: kevin <[email protected]> * Update src/kohlrahbi/read_functions.py Co-authored-by: kevin <[email protected]> * removed unused imports * Added explanation to duplicate code warning * Removed unused function --------- Co-authored-by: hf-krechan <[email protected]> Co-authored-by: kevin <[email protected]>
- Loading branch information
1 parent
b757e59
commit 715c179
Showing
85 changed files
with
1,424 additions
and
275 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
version = "0.4.2.dev91+g53b6228" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
"""This module contains the ahbconditions class.""" | ||
|
||
import json | ||
import re | ||
from pathlib import Path | ||
|
||
from docx.table import Table as DocxTable # type: ignore[import-untyped] | ||
from maus.edifact import EdifactFormat | ||
from pydantic import BaseModel, ConfigDict | ||
|
||
from kohlrahbi.logger import logger | ||
|
||
|
||
class AhbConditions(BaseModel): | ||
""" | ||
Class which contains a dict of conditions for each edifact format | ||
""" | ||
|
||
conditions_dict: dict[EdifactFormat, dict[str, str]] = {} | ||
|
||
model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
||
@classmethod | ||
def from_docx_table(cls, docx_tables: list[DocxTable], edifact_format: EdifactFormat) -> "AhbConditions": | ||
""" | ||
Create an AhbPackageTable object from a docx table. | ||
""" | ||
table_data = [] | ||
for table in docx_tables: | ||
for row in table.rows: | ||
if row.cells[-1].text and row.cells[0].text != "EDIFACT Struktur": | ||
row_data = row.cells[-1].text | ||
table_data.append(row_data) | ||
|
||
conditions_dict = {} | ||
are_there_conditions = len(table_data) > 0 | ||
if are_there_conditions: | ||
conditions_dict = AhbConditions.collect_conditions( | ||
conditions_list=table_data, edifact_format=edifact_format | ||
) | ||
|
||
return cls(conditions_dict=conditions_dict) | ||
|
||
@staticmethod | ||
def collect_conditions( | ||
conditions_list: list[str], edifact_format: EdifactFormat | ||
) -> dict[EdifactFormat, dict[str, str]]: | ||
"""collect conditions from list of all conditions and store them in conditions dict.""" | ||
conditions_dict: dict[EdifactFormat, dict[str, str]] = {edifact_format: {}} | ||
|
||
conditions_str = "".join(conditions_list) | ||
conditions_dict = parse_conditions_from_string(conditions_str, edifact_format, conditions_dict) | ||
logger.info("The package conditions for %s were collected.", edifact_format) | ||
return conditions_dict | ||
|
||
def include_condition_dict(self, to_add=dict[EdifactFormat, dict[str, str]] | None) -> None: | ||
""" " Include a dict of conditions to the conditions_dict""" | ||
if to_add is None: | ||
logger.info("Conditions dict to be added is empty.") | ||
for edifact_format, edi_cond_dict in to_add.items(): | ||
for condition_key, condition_text in edi_cond_dict.items(): | ||
if edifact_format in self.conditions_dict: | ||
if ( | ||
condition_key in self.conditions_dict[edifact_format] | ||
and len(condition_text) > len(self.conditions_dict[edifact_format][condition_key]) | ||
or condition_key not in self.conditions_dict[edifact_format] | ||
): | ||
self.conditions_dict[edifact_format][condition_key] = condition_text | ||
else: | ||
self.conditions_dict[edifact_format] = {condition_key: condition_text} | ||
|
||
logger.info("Conditions were updated.") | ||
|
||
def dump_as_json(self, output_directory_path: Path) -> None: | ||
""" | ||
Writes all collected conditions to a json file. | ||
The file will be stored in the directory: | ||
'output_directory_path/<edifact_format>/conditions.json' | ||
""" | ||
for edifact_format, format_cond_dict in self.conditions_dict.items(): | ||
condition_json_output_directory_path = output_directory_path / str(edifact_format) | ||
condition_json_output_directory_path.mkdir(parents=True, exist_ok=True) | ||
file_path = condition_json_output_directory_path / "conditions.json" | ||
# resort ConditionKeyConditionTextMappings for output | ||
sorted_condition_dict = {k: format_cond_dict[k] for k in sorted(format_cond_dict, key=int)} | ||
array = [ | ||
{"condition_key": i, "condition_text": sorted_condition_dict[i], "edifact_format": edifact_format} | ||
for i in sorted_condition_dict | ||
] | ||
with open(file_path, "w", encoding="utf-8") as file: | ||
json.dump(array, file, ensure_ascii=False, indent=2) | ||
|
||
logger.info( | ||
"The conditions.json file for %s is saved at %s", | ||
edifact_format, | ||
file_path, | ||
) | ||
|
||
|
||
def parse_conditions_from_string( | ||
conditions_text: str, edifact_format: EdifactFormat, conditions_dict: dict[EdifactFormat, dict[str, str]] | ||
) -> dict[EdifactFormat, dict[str, str]]: | ||
""" | ||
Takes string with some conditions and sorts it into a dict. | ||
""" | ||
# Split the input into parts enclosed in square brackets and other parts | ||
matches = re.findall( | ||
r"\[(\d+)](.*?)(?=\[\d+]|$)", | ||
conditions_text, | ||
re.DOTALL, | ||
) | ||
for match in matches: | ||
# make text prettier: | ||
text = match[1].strip() | ||
text = re.sub(r"\s+", " ", text) | ||
|
||
# check whether condition was already collected: | ||
existing_text = conditions_dict[edifact_format].get(match[0]) | ||
is_condition_key_collected_yet = existing_text is not None | ||
if is_condition_key_collected_yet and existing_text is not None: | ||
key_exits_but_shorter_text = len(text) > len(existing_text) | ||
if not is_condition_key_collected_yet or key_exits_but_shorter_text: | ||
conditions_dict[edifact_format][match[0]] = text | ||
return conditions_dict |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
""" | ||
class which contains AHB package condition table | ||
""" | ||
|
||
import json | ||
import re | ||
from pathlib import Path | ||
|
||
import pandas as pd | ||
from docx.table import Table as DocxTable # type: ignore[import-untyped] | ||
from maus.edifact import EdifactFormat | ||
from pydantic import BaseModel, ConfigDict | ||
|
||
from kohlrahbi.ahbtable.ahbcondtions import parse_conditions_from_string | ||
from kohlrahbi.logger import logger | ||
|
||
|
||
class AhbPackageTable(BaseModel): | ||
""" | ||
This class contains the AHB Package table as you see it in the beginning AHB documents, | ||
but in a machine readable format. | ||
Caution: if two PackageTables objects are combined so far only the package_dict field is updated. | ||
""" | ||
|
||
table: pd.DataFrame = pd.DataFrame() | ||
package_dict: dict[EdifactFormat, dict[str, str]] = {} | ||
model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
||
@classmethod | ||
def from_docx_table(cls, docx_tables: list[DocxTable]) -> "AhbPackageTable": | ||
""" | ||
Create an AhbPackageTable object from a docx table. | ||
""" | ||
table_data = [] | ||
for table in docx_tables: | ||
for row in table.rows: | ||
row_data = [cell.text for cell in row.cells] | ||
table_data.append(row_data) | ||
|
||
headers = table_data[0] | ||
data = table_data[1:] | ||
df = pd.DataFrame(data, columns=headers) | ||
return cls(table=df) | ||
|
||
def provide_conditions(self, edifact_format: EdifactFormat) -> dict[EdifactFormat, dict[str, str]]: | ||
"""collect conditions from package table and store them in conditions dict.""" | ||
conditions_dict: dict[EdifactFormat, dict[str, str]] = {edifact_format: {}} | ||
there_are_conditions = (self.table["Bedingungen"] != "").any() | ||
if there_are_conditions: | ||
for conditions_text in self.table["Bedingungen"][self.table["Bedingungen"] != ""]: | ||
conditions_dict = parse_conditions_from_string(conditions_text, edifact_format, conditions_dict) | ||
logger.info("The package conditions for %s were collected.", edifact_format) | ||
return conditions_dict | ||
|
||
def provide_packages(self, edifact_format: EdifactFormat): | ||
"""collect conditions from package table and store them in conditions dict.""" | ||
package_dict: dict[EdifactFormat, dict[str, str]] = {edifact_format: {}} | ||
|
||
there_are_packages = (self.table["Paket"] != "").any() | ||
if there_are_packages: | ||
for _, row in self.table.iterrows(): | ||
package = row["Paket"] | ||
# Use re.search to find the first match | ||
match = re.search(r"\[(\d+)P\]", package) | ||
if not match: | ||
raise ValueError("No valid package key found in the package column.") | ||
# Extract the matched digits | ||
package = match.group(1) | ||
if package != "1": | ||
package_conditions = row["Paketvoraussetzung(en)"].strip() | ||
# check whether package was already collected: | ||
existing_text = package_dict[edifact_format].get(package) | ||
is_package_key_collected_yet = existing_text is not None | ||
if is_package_key_collected_yet: | ||
key_exits_but_shorter_text = len(package_conditions) > len( | ||
existing_text # type: ignore[arg-type] | ||
) # type: ignore[arg-type] | ||
if not is_package_key_collected_yet or key_exits_but_shorter_text: | ||
package_dict[edifact_format][package] = package_conditions | ||
|
||
logger.info("Packages for %s were collected.", edifact_format) | ||
self.package_dict = package_dict | ||
|
||
def include_package_dict(self, to_add=dict[EdifactFormat, dict[str, str]] | None) -> None: | ||
"""Include a dict of conditions to the conditions_dict""" | ||
if to_add is None: | ||
logger.info("Packages dict to be added is empty.") | ||
for edifact_format, edi_cond_dict in to_add.items(): | ||
for package_key, package_conditions in edi_cond_dict.items(): | ||
if edifact_format in self.package_dict: | ||
if ( | ||
package_key in self.package_dict[edifact_format] | ||
and len(package_conditions) > len(self.package_dict[edifact_format][package_key]) | ||
or package_key not in self.package_dict[edifact_format] | ||
): | ||
self.package_dict[edifact_format][package_key] = package_conditions | ||
else: | ||
self.package_dict[edifact_format] = {package_key: package_conditions} | ||
|
||
logger.info("Packages were updated.") | ||
|
||
def dump_as_json(self, output_directory_path: Path) -> None: | ||
""" | ||
Writes all collected packages to a json file. | ||
The file will be stored in the directory: | ||
'output_directory_path/<edifact_format>/conditions.json' | ||
""" | ||
for edifact_format, format_pkg_dict in self.package_dict.items(): | ||
package_json_output_directory_path = output_directory_path / str(edifact_format) | ||
package_json_output_directory_path.mkdir(parents=True, exist_ok=True) | ||
file_path = package_json_output_directory_path / "packages.json" | ||
# resort PackageKeyConditionTextMappings for output | ||
sorted_package_dict = {k: format_pkg_dict[k] for k in sorted(format_pkg_dict, key=int)} | ||
array = [ | ||
{"package_key": i + "P", "package_expression": sorted_package_dict[i], "edifact_format": edifact_format} | ||
for i in sorted_package_dict | ||
] | ||
with open(file_path, "w", encoding="utf-8") as file: | ||
json.dump(array, file, ensure_ascii=False, indent=2) | ||
|
||
logger.info( | ||
"The package.json file for %s is saved at %s", | ||
edifact_format, | ||
file_path, | ||
) |
Oops, something went wrong.