Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python restructure #1813

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Python restructure #1813

wants to merge 9 commits into from

Conversation

ajparsons
Copy link
Contributor

@ajparsons ajparsons commented Aug 23, 2024

Python project structuring

This is a set of changes to add good support for Python within the TheyWorkForYou repo.

We currently use Python in a number of helper scripts (and Parlparse is substantially Python), but within theyworkforyou itself these are one off and not sharing a common library. Until recently, we did not have a requirements.txt etc.

Adding more structure here makes introducing new helper scripts and features smoother and lets them share features and libraries. In the long run, better support for Python would help import libraries from ParlParse into the main repo.

Something specifically I'm working towards is faster imports of vote comparison information (and other kinds of datasets) - which can be done through existing approaches, but it'd be more stable within the larger framework.

This PR adds structure and config to support python development within the repo, migrates an existing script, and adds a new script building on the structure.

Meta python changes

  • requirements.txt replaced with Poetry (will need feature enabled before deploy to server)
  • Creates a src/twfy_tools that acts as the common library for features.
  • Config for ruff as a formatter/linter.
  • pytest added and incorporated into testing action.

While I'm making a lot of use of type hints and related libraries - no automated linters enforce strong type hints in this repo.

Connections to TWFY config and database

  • The existing commonlib config access has had some basic wrapping around it to make variables (and their types) accessible to IDEs.
  • There is the basics of an approach for using the Django ORM to access models within Python.

I thought a bit about the best ORM to use here (considered SQLAlchemy, SQLModel) - but given we generally use django consistency seemed useful. That said, I'm experimenting with a lightweight approach at creating django/pydantic models in parallel using the same type definitions.

In addition to type hinting in IDE, this means models can be validated in Python when created - useful as a key reason for this ORM is being able to bulk add things to the database - and database validation failures are a pain on bulk adds.

Currently there is only the User model. The schema has a lot of tables that are rarely used, so I expect to create new Models as needed rather than doing a big conversion job (although that's not so much work - the main issue is the automatic django conversion doesn't capture default database values).

Common CLI

poetry run twfy-tools ui shows all options for a common typer/click based CLI.

New scripts can be added into this through a hook in __main__.py.

There's a structure decision to be made here - I'm moving some scripts that previously sat in scripts to src/twfy_tools/utils - but these could still sit in scripts and access the common library.

My thinking has roughly been there's a 'new python' section that has the linters applied and makes more use of shared resources, and older scripts might or might migrate here over time.

A more mixed approach is possible - where the scripts directory is also covered by the linter but makes more use of common features.

Moving division_io.py

As a test case, I've moved division_io.py into the new approach and updated existing links.

This hooks into the new config approach - and makes use of the shared database connection approach.

Adding contact_uploader

(This could be its own PR but it makes sense of some of the new functions)

We collect contact optins but irregularly move the data (which is bad!).

This moves an external script into the main repo - and makes use of the Django ORM User model to access and upload the optin information.

This demonstrates that API keys not used in php anywhere are still stored in conf/general - and the extra line needed to move that to Python.

Currently this is not added to the dailyupdate task - will need to run a few times manually before doing so.

- switch to poetry based approach
- create new python data using src
- configure linting via ruff
- vscode settings update
- Wrap commonlib config access in a pydantic model
- Transform types and reveal settings to python IDE
- include sqlalchemy engine for pandas
-- access through `python -m twfy_tools`
- hooked into generic connection and config
- hooked into overall cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant