Skip to content

Latest commit

 

History

History
114 lines (70 loc) · 6.45 KB

MAINTAINERS.md

File metadata and controls

114 lines (70 loc) · 6.45 KB

prefect-databricks

Getting Started

Now that you've bootstrapped a project, follow the steps below to get started developing your Prefect Collection!

Python setup

Requires an installation of Python 3.7+

We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.

GitHub setup

Create a Git respoitory for the newly generated collection and create the first commit:

git init
git add .
git commit -m "Initial commit: project generated by prefect-collection-template"

Then, create a new repo following the prompts at: https://github.com/organizations/PrefectHQ/repositories/new

Upon creation, push the repository to GitHub:

git remote add origin https://github.com/PrefectHQ/prefect-databricks.git
git branch -M main
git push -u origin main

It's recommended to setup some protection rules for main at: https://github.com/PrefectHQ/prefect-databricks/settings/branches

  • Require a pull request before merging
  • Require approvals

Lastly, code owners for the repository can be set, like this example here.

Project setup

To setup your project run the following:

# Create an editable install of your project
pip install -e ".[dev]"

# Configure pre-commit hooks
pre-commit install

To verify the setup was successful you can run the following:

  • Run the tests for tasks and flows in the collection:
    pytest tests
  • Serve the docs with mkdocs:
    mkdocs serve

Developing tasks and flows

For information about the use and development of tasks and flow, check out the flows and tasks concepts docs in the Prefect docs.

Writing documentation

This collection has been setup to with mkdocs for automatically generated documentation. The signatures and docstrings of your tasks and flow will be used to generate documentation for the users of this collection. You can make changes to the structure of the generated documentation by editing the mkdocs.yml file in this project.

To add a new page for a module in your collection, create a new markdown file in the docs directory and add that file to the nav section of mkdocs.yml. If you want to automatically generate documentation based on the docstrings and signatures of the contents of the module with mkdocstrings, add a line to the new markdown file in the following format:

::: prefect_databricks.{module_name}

You can also refer to the flows.md and tasks.md files included in your generated project as examples.

Once you have working code, replace the default "Write and run a flow" example in README.md to match your collection.

Development lifecycle

CI Pipeline

This collection comes with GitHub Actions for testing and linting. To add additional actions, you can add jobs in the .github/workflows folder. Upon a pull request, the pipeline will run linting via black, flake8, interrogate, and unit tests via pytest alongside coverage.

interrogate will tell you which methods, functions, classes, and modules have docstrings, and which do not--the job has a fail threshold of 95%, meaning that it will fail if more than 5% of the codebase is undocumented. We recommend following the Google Python Style Guide for docstring format.

Simiarly, coverage ensures that the codebase includes tests--the job has a fail threshold of 80%, meaning that it will fail if more than 20% of the codebase is missing tests.

Track Issues on Project Board

To automatically add issues to a GitHub Project Board, you'll need a secret added to the repository. Specifically, a secret named ADD_TO_PROJECT_URL, formatted like https://github.com/orgs/<GITHUB_ORGANIZATION>/projects/<PROJECT_NUMBER>.

Package and Publish

GitHub actions will handle packaging and publishing of your collection to PyPI so other Prefect users can your collection in their flows.

To publish to PyPI, you'll need a PyPI account and to generate an API token to authenticate with PyPI when publishing new versions of your collection. The PyPI documentation outlines the steps needed to get an API token.

Once you've obtained a PyPI API token, create a GitHub secret named PYPI_API_TOKEN.

To publish a new version of your collection, create a new GitHub release and tag it with the version that you want to deploy (e.g. v0.3.2). This will trigger a workflow to publish the new version on PyPI and deploy the updated docs to GitHub pages.

Upon publishing, a docs branch is automatically created. To hook this up to GitHub Pages, simply head over to https://github.com/PrefectHQ/prefect-databricks/settings/pages, select docs under the dropdown menu, keep the default /root folder, Save, and upon refresh, you should see a prompt stating "Your site is published at https://PrefectHQ.github.io/prefect-databricks". Don't forget to add this link to the repo's "About" section, under "Website" so users can access the docs easily.

Feel free to submit your collection to the Prefect Collections Catalog!

Further guidance

If you run into any issues during the bootstrapping process, feel free to open an issue in the prefect-collection-template repository.

If you have any questions or issues while developing your collection, you can find help in either the Prefect Discourse forum or the Prefect Slack community.