Skip to content

Commit

Permalink
feat(pip_repository): Support pip parse cycles (#1166)
Browse files Browse the repository at this point in the history
This patch reworks the `pip_repository` machinery to allow users to
manually annotate groups of libraries which form packaging cycles in
PyPi and must be simultaneously installed.

The strategy here is to transform any dependencies `A` and `B` which
have dependencies and are mutually dependent

```mermaid
graph LR;
    A-->B;
    A-->D;
    A-->E;
    B-->A;
    B-->F;
    B-->G;
```

into a new "dependency group" `C` which has `A*` and `B*` as
dependencies, defined as `A` and `B` less any direct dependencies which
are members of the group. This is viable _for python_ because Python
files just need to be emplaced into a runfiles directory for the
interpreter. We don't actually have a true hard dependency between the
build definition of `A` requiring the build product `B` be available
which requires that the build product of `A` be available.

```mermaid
graph LR
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
```
This gets us most of the way there, as a user can now safely write
`requirement("A")` and we can provide them with `C`, which has the
desired effect of pulling in `A`, `B` and their respective transitives.

There is one remaining problem - a user writing `deps =
[requirement("A"), requirement("B")]` will take a double direct
dependency on `C`. So we need to insert a layer of indirection,
generating `C_A` and `C_B` which serve only as unique aliases for `C` so
that we can support the double dependency. Our final dependency graph
then is as follows

```mermaid
graph LR
     C_A-->C;
     C_B-->C;
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
```

Addresses #1076, #1188

## To do
- [x] Get rebased
- [x] Get re-validated manually
- [x] Buildifier
- [x] Get CI happy
- [x] Update documentation
- [x] Update changelog

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
  • Loading branch information
arrdem and aignas authored Nov 28, 2023
1 parent 69abe80 commit 4725d98
Show file tree
Hide file tree
Showing 37 changed files with 12,534 additions and 124 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ A brief description of the categories of changes:
`data`. Note, that the `@pypi_foo//:pkg` labels are still present for
backwards compatibility.

* (pip_parse) The parameter `requirement_cycles` may be provided a map of names
to lists of requirements which form a dependency cycle. `pip_parse` will break
the cycle for you transparently. This behavior is also available under bzlmod
as `pip.parse(requirement_cycles={})`.

* (gazelle) The flag `use_pip_repository_aliases` is now set to `True` by
default, which will cause `gazelle` to change third-party dependency labels
from `@pip_foo//:pkg` to `@pip//foo` by default.
Expand Down
80 changes: 80 additions & 0 deletions docs/sphinx/pypi-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,86 @@ Any 'extras' specified in the requirements lock file will be automatically added
as transitive dependencies of the package. In the example above, you'd just put
`requirement("useful_dep")`.

### Packaging cycles

Sometimes PyPi packages contain dependency cycles -- for instance `sphinx`
depends on `sphinxcontrib-serializinghtml`. When using them as `requirement()`s,
ala

```
py_binary(
name = "doctool",
...
deps = [
requirement("sphinx"),
]
)
```

Bazel will protest because it doesn't support cycles in the build graph --

```
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
//:doctool (...)
@pypi//sphinxcontrib_serializinghtml:pkg (...)
.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
| @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
| @pypi_sphinx//:pkg (...)
| @pypi_sphinx//:_pkg (...)
`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
```

The `requirement_cycles` argument allows you to work around these issues by
specifying groups of packages which form cycles. `pip_parse` will transparently
fix the cycles for you and provide the cyclic dependencies simultaneously.

```
pip_parse(
...
requirement_cycles = {
"sphinx": [
"sphinx",
"sphinxcontrib-serializinghtml",
]
},
)
```

`pip_parse` supports fixing multiple cycles simultaneously, however cycles must
be distinct. `apache-airflow` for instance has dependency cycles with a number
of its optional dependencies, which means those optional dependencies must all
be a part of the `airflow` cycle. For instance --

```
pip_parse(
...
requirement_cycles = {
"airflow": [
"apache-airflow",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-postgres",
"apache-airflow-providers-sqlite",
]
}
)
```

Alternatively, one could resolve the cycle by removing one leg of it.

For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow
package, `apache-airflow-providers-postgres` is not and is an optional feature.
Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which
would expose a cycle via the extra, one could either _manually_ depend on
`apache-airflow` and `apache-airflow-providers-postgres` separately as
requirements. Bazel rules which need only `apache-airflow` can take it as a
dependency, and rules which explicitly want to mix in
`apache-airflow-providers-postgres` now can.

Alternatively, one could use `rules_python`'s patching features to remove one
leg of the dependency manually. For instance by making
`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or
perhaps `apache-airflow-providers-common-sql`.

## Consuming Wheel Dists Directly

If you need to depend on the wheel dists themselves, for instance, to pass them
Expand Down
1 change: 1 addition & 0 deletions examples/build_file_generation/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ py_library(
deps = [
"//random_number_generator",
"@pip//flask",
"@pip//sphinx",
],
)

Expand Down
13 changes: 13 additions & 0 deletions examples/build_file_generation/WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,19 @@ load("@rules_python//python:pip.bzl", "pip_parse")
# You can instead check this `requirements.bzl` file into your repo.
pip_parse(
name = "pip",

# Requirement groups allow Bazel to tolerate PyPi cycles by putting dependencies
# which are known to form cycles into groups together.
experimental_requirement_cycles = {
"sphinx": [
"sphinx",
"sphinxcontrib-qthelp",
"sphinxcontrib-htmlhelp",
"sphinxcontrib-devhelp",
"sphinxcontrib-applehelp",
"sphinxcontrib-serializinghtml",
],
},
# (Optional) You can provide a python_interpreter (path) or a python_interpreter_target (a Bazel target, that
# acts as an executable). The latter can be anything that could be used as Python interpreter. E.g.:
# 1. Python interpreter that you compile in the build file.
Expand Down
1 change: 1 addition & 0 deletions examples/build_file_generation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

from flask import Flask, jsonify
from random_number_generator import generate_random_number
import sphinx # noqa

app = Flask(__name__)

Expand Down
Loading

0 comments on commit 4725d98

Please sign in to comment.