This is a template built with Copier to generate a data science focused python project.
Get started with the following command:
copier copy gh:felixgwilliams/python-copier-template-ds path/to/destination
It is assumed that most of the work will be done in Jupyter Notebooks. However, the template also includes a python project, in which you can put functions and classes shared across notebooks. The repository is set up to use Pytest for unit testing this module code.
The template also includes a data
directory whose contents will be ignored by git.
You can use this folder to store data that you do not commit.
You may also put a readme file in which you can document the source datasets you use and how to acquire them.
just
is a command runner that allows you to easily to run project-specific commands.
In fact, you can use just
to run all the setup commands listed below:
just setup
The repository is set up to use uv for package or project management. You may set up your python environment with
uv sync --all-groups --all-extras
The repository is configured to use Ruff for linting and formatting. By default, all lints are enabled except
COM
Enforces trailing commasERA
Disallows commented-out codeISC001
(conflicts with the formatter).
In addition, the following rules are only enforced for module code as they are inappropriate or too strict for unit tests and notebooks:
D
Requires docstrings on functions, classes and modulesANN
Requires type annotations on functions and methodsS101
Disallows use ofassert
PLR2004
Disallows "magic" values in comparisonsT20
Disallows print statements
The target line length is 120 and the docstring convention is google.
pre-commit is a tool that runs checks on your files before you commit them with git, thereby helping ensure code quality. Enable it with the following command:
pre-commit install --install-hooks
The configuration is stored in .pre-commit-config.yaml
.
nbwipers
is a tool written in rust to ensure Jupyter notebooks are clean.
Committing notebooks that are not clean makes diffs more confusing, can degrade performance and increases the risk of leaking sensitive information.
You can set it up as a git filter with the following command.
nbwipers install local
The repository comes configured to use pytest
for unit testing the module code.
Feel free to ignore it if you do not write module code.
You may optionally add a github workflow file which checks the following:
- uses ruff to check files are formatted and linted
- Runs unit tests and checks coverage
- Checks any markdown files are formatted with markdownlint-cli2
- Checks that all jupyter notebooks are clean
Typos checks for common typos in code, aiming for a low false positive rate. The repository is configured not to use it for Jupyter notebook files, as it tends to find errors in cell outputs.
Test with Copier and copier-template-tester.