Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Migrate to CPR SDK #87

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions .github/workflows/code-quality.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,18 @@ jobs:
# For repo checkout
contents: read
steps:
- uses: actions/checkout@v4
- name: Check out repository code
jesse-c marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"
python-version: "3.10"
jesse-c marked this conversation as resolved.
Show resolved Hide resolved

- name: Install dependencies
run: |
python -m pip install "poetry==1.7.0"
poetry config virtualenvs.create false
poetry install --no-cache
poetry install --only-root

- name: Export PYTHONPATH
run: echo "PYTHONPATH=$(pwd)" >> $GITHUB_ENV
python -m pip install poetry
jesse-c marked this conversation as resolved.
Show resolved Hide resolved
poetry install

- name: Trunk Check
uses: trunk-io/trunk-action@v1
Expand Down
15 changes: 4 additions & 11 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,15 @@ jobs:
- name: Check out repository code
uses: actions/checkout@v4

- name: Install poetry
run: pipx install poetry==1.2.2

- name: Install python or load from cache with dependencies
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"
cache: poetry
python-version: "3.10"

- name: Install dependencies
run: |
python -m pip install poetry
poetry install

- name: Run test suite
run: |
poetry run python -m pytest \
--nbmake \
--nbmake-find-import-errors \
--nbmake-timeout=20 -vvv
run: make test
jesse-c marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.PHONY: test
.PHONY: install test

install:
poetry install

test:
poetry run python -m pytest -vvv
poetry run pytest -vvv
1,875 changes: 924 additions & 951 deletions poetry.lock

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ packages = [{ include = "azure_pdf_parser", from = "src" }]
[tool.poetry.dependencies]
python = ">=3.9,<4.0"
azure-ai-formrecognizer = "^3.2.1"
cpr-data-access = { git = "https://github.com/climatepolicyradar/data-access.git", tag = "0.4.0" }
cpr-sdk = "^1.1.6"
requests = "^2.31.0"
langdetect = "^1.0.9"
pypdf = "^3.15.0"
Expand All @@ -22,7 +22,6 @@ pre-commit = "^2.17.0"
python-dotenv = "^0.19.2"
pytest = "^7.0.1"
pytest-dotenv = "^0.5.2"
nbmake = "^1.4.1"
httpx = "^0.27.0"
mock = "^5.1.0"

Expand Down
2 changes: 1 addition & 1 deletion scripts/unece_sprint/check_parsed_docs.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Check whether the UNECE documents can be loaded using the CPR SDK"""

import boto3
from cpr_data_access.parser_models import BaseParserOutput
from cpr_sdk.parser_models import BaseParserOutput
from rich.console import Console

console = Console()
Expand Down
2 changes: 1 addition & 1 deletion src/azure_pdf_parser/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
DocumentTable,
Point,
)
from cpr_data_access.parser_models import (
from cpr_sdk.parser_models import (
CONTENT_TYPE_PDF,
BlockType,
ParserInput,
Expand Down
4 changes: 2 additions & 2 deletions src/azure_pdf_parser/experimental_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
from typing import List, Optional, Sequence, Union

from azure.ai.formrecognizer import Point
from cpr_data_access.parser_models import (
from cpr_sdk.parser_models import (
CONTENT_TYPE_HTML,
CONTENT_TYPE_PDF,
HTMLData,
PDFData,
TextBlock,
)
from cpr_data_access.pipeline_general_models import BackendDocument
from cpr_sdk.pipeline_general_models import BackendDocument
from langdetect import DetectorFactory, detect
from pydantic import AnyHttpUrl, BaseModel, model_validator

Expand Down
4 changes: 2 additions & 2 deletions src/azure_pdf_parser/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from azure.ai.formrecognizer import AnalyzeResult
from azure.core.exceptions import HttpResponseError
from cpr_data_access.parser_models import BackendDocument, ParserInput
from cpr_sdk.parser_models import BackendDocument, ParserInput
from dotenv import find_dotenv, load_dotenv
from pydantic import AnyHttpUrl
from tqdm.auto import tqdm
Expand Down Expand Up @@ -119,7 +119,7 @@ def run_parser(

if not AZURE_PROCESSOR_KEY or not AZURE_PROCESSOR_ENDPOINT:
raise ValueError(
"""Missing Azure API credentials. Set AZURE_PROCESSOR_KEY and
"""Missing Azure API credentials. Set AZURE_PROCESSOR_KEY and
AZURE_PROCESSOR_ENDPOINT environment variables."""
)

Expand Down
4 changes: 2 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
DocumentTable,
DocumentTableCell,
)
from cpr_data_access.parser_models import ParserInput
from cpr_data_access.pipeline_general_models import BackendDocument
from cpr_sdk.parser_models import ParserInput
from cpr_sdk.pipeline_general_models import BackendDocument
from pydantic import AnyHttpUrl

from azure_pdf_parser import AzureApiWrapper, PDFPagesBatchExtracted
Expand Down
2 changes: 1 addition & 1 deletion tests/test_azure_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from unittest.mock import Mock, patch

from azure.ai.formrecognizer import AnalyzeResult
from cpr_data_access.parser_models import ParserInput, ParserOutput
from cpr_sdk.parser_models import ParserInput, ParserOutput

from azure_pdf_parser import (
AzureApiWrapper,
Expand Down
2 changes: 1 addition & 1 deletion tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from unittest.mock import patch

from click.testing import CliRunner
from cpr_data_access.parser_models import ParserOutput
from cpr_sdk.parser_models import ParserOutput

from src.azure_pdf_parser import AzureApiWrapper

Expand Down
7 changes: 1 addition & 6 deletions tests/test_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,7 @@
DocumentTable,
Point,
)
from cpr_data_access.parser_models import (
BlockType,
ParserInput,
ParserOutput,
PDFTextBlock,
)
from cpr_sdk.parser_models import BlockType, ParserInput, ParserOutput, PDFTextBlock

from azure_pdf_parser.base import DIMENSION_CONVERSION_FACTOR
from azure_pdf_parser.convert import (
Expand Down
Loading