Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/pdct 1441 Add family_import_id and import_id to GCF document parser #12

Merged
merged 23 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
6787bca
Pull document enums into separate file
katybaulch Sep 4, 2024
20a334a
Check required columns don't have NA values helper
katybaulch Sep 4, 2024
bd82693
Test refactored document parser
katybaulch Sep 4, 2024
c5a6c02
Refactor document parser into smaller functions
katybaulch Sep 5, 2024
eaec824
Create test_commands.py
katybaulch Sep 5, 2024
42a98f7
Move --version test to separate file
katybaulch Sep 5, 2024
a5d074d
Rename OptionalDocumentColumns to TranslatedDocumentColumns
katybaulch Sep 5, 2024
6765613
Explicitly return None
katybaulch Sep 5, 2024
914f651
Update comment for document parser processing.
katybaulch Sep 5, 2024
c0b6436
Docstring consistency
katybaulch Sep 5, 2024
7eb6ec8
Bump to 0.1.9
katybaulch Sep 5, 2024
795a245
Update cspell dict and removed file from trunk ignore
katybaulch Sep 5, 2024
baa281c
Removed test duplication
katybaulch Sep 5, 2024
ab3271c
Added pydantic to cpsell dict
katybaulch Sep 5, 2024
d60aec2
Move move fixtures for valid rows for docs with translations to conftest
katybaulch Sep 5, 2024
516fe38
Move source URL to required doc columsn
katybaulch Sep 5, 2024
1c32897
Move mock data to conftest
katybaulch Sep 5, 2024
f065c53
Test process row returns None when NA in required cols
katybaulch Sep 5, 2024
d847fa2
Remove dependency on enums in tests
katybaulch Sep 5, 2024
5f1e696
Fix enum names for IgnoreDocumentTypes
katybaulch Sep 5, 2024
107f4c3
Fix pyright error
katybaulch Sep 5, 2024
1cb36d1
Return False instead of None
katybaulch Sep 5, 2024
2298a39
Make test more explicit
katybaulch Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .trunk/configs/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@
"iterrows",
"notna",
"conftest",
"capsys"
"capsys",
"dtypes",
"isin",
"pydantic",
"getfixturevalue"
],
"flagWords": ["hte"],
"suggestionsTimeout": 5000
Expand Down
1 change: 0 additions & 1 deletion .trunk/trunk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ lint:
paths:
- .trunk/configs/cspell.json
- .gitignore
- tests/unit_tests/parsers/document/conftest.py
- linters: [pre-commit-hooks, prettier]
paths:
- tests/unit_tests/fixtures/malformed_data.json
Expand Down
2 changes: 1 addition & 1 deletion gcf_data_mapper/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def wrangle_to_json(
return {
"collections": collection(debug),
"families": family(project_info, debug),
"documents": document(doc_info, debug),
"documents": document(project_info, doc_info, debug),
"events": event(project_info, debug),
}

Expand Down
34 changes: 34 additions & 0 deletions gcf_data_mapper/enums/document.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from enum import Enum


class RequiredDocumentColumns(Enum):
TITLE = "Title"
TYPE = "Type"
ID = "ID (Unique ID from our CMS for the document)"
SOURCE_URL = "Document page permalink"


class TranslatedDocumentColumns(Enum):
TRANSLATED_FILES = "Translated files"
TRANSLATED_TITLES = "Translated titles"


class RequiredFamilyDocumentColumns(Enum):
APPROVED_REF = "ApprovedRef"
PROJECTS_ID = "ProjectsID"


class IgnoreDocumentTypes(Enum):
"""Filter the following columns out of the GCF document data.

TODO: Phase 2 GCF/MCF we will need to parse these document types too
but for now, we will omit them.
"""

POLICIES_STRATEGIES_GUIDELINES = "Policies, strategies, and guidelines"
COUNTRY_PROGRAMME = "Country programme"


class DocumentVariantNames(Enum):
ORIGINAL = "Original Translation"
TRANSLATION = "Translated"
Loading
Loading