-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate one Bazel repo per dependency #2370
Comments
I think this is working as expected. To confirm, can you please create a reproduction repository to demonstrate the issue? |
@groodt What do you mean by "working as expected"? The way it currently works what is expected from the current architecture, but the current architecture is problematic. |
Example: Module
Module
Module `C:
The Protobuf version available in |
The recommendation is to produce a single lockfile for your dependencies and register that in your hub repository. See https://rules-python.readthedocs.io/en/latest/pypi-dependencies.html Mixing dependencies across dependency closures as you have in your example is indeed problematic and not supported. You may wish to explore other rules that might offer the behaviour that you desire. |
This is not possible for dependencies added from BCR which add their own Python code and are not available on pypi.org. We do not have control over these dependencies. I can find at least 170 such package versions currently: https://github.com/search?q=repo%3Abazelbuild%2Fbazel-central-registry%20pip.parse&type=code. This problem will grow every day as more packages migrate to Bzlmod and are uploaded to BCR. |
Unfortunately, a canonical target for every dependency doesn't Just Work, either. Going back to the Module A/B/C example, lets pretend there is a single repo for each dependency. We know that there are different requirements.txt files, which means there are different "concrete" implementions for a dependency (in the example, protobuf). Which concrete implementation to use depends on the final consumer (e.g. the binary / requirements.txt), not the point of need (module A's target needing protobuf). However, an upstream target can't know what its downstream targets's constraints are. i.e., module A's target can't know whether it's module A, module B, or module C that is consuming it. Thus, the point-of-need can't specify what concrete implementation to use. Similarly, the final consumer can't arbitrarily override what every point-of-need says. re: "[libraries] takes part in version resolution": part of how requirements.txt works is you can arbitrarily pin a dependency to a version. It's then up to the resolver to find a closure with that constraint. Solving that is outside the scope and ability of Bazel (solving a dependency closure's constraints can take several minutes sometimes). My point here is that picking which concrete implementation to use isn't simply decided based on python version or OS. It can be arbitrary and out-of-band information. The only way to make something resembling this work is something like:
This isn't a very tenable design at scale, though. It's fragile, unwieldy, and leads to configuration state explosion.
I think this is the basic point of misunderstanding -- rules_python supports multiple dependency closures in a single build. This is based on user request -- not every repo is a monorepo with a strict one-version policy as enforced within Google. As a general rule, you can't mix "bazel dependency management" (e.g. implicit deps on library rules) with "external dependency management" (e.g. requirements.txt) when it comes to user dependencies. The two methods have separate dependency closures and aren't aware of each other, and thus can't safely interoperate.
In this situation, the best a BCR module can do is:
A mediocre middle ground might be to do something like:
This is, effectively, similar to the deprecated |
Related issue - #2094 |
Yes, this comes up from time to time. I'd still value a realistic reproduction in a realistic repo if thats possible please @mering? Most of the examples you've highlighted in the BCR search seem to be related to grpc? So perhaps there is a nicer way to solve this in rules_proto or similar that we can then document. The only requirement for grpc generated code and the runtime in a userspace build is that the protobuf versions are compatible. It doesn't really require that the runtime code is using a grpc dependency provided by the grpc BCR module. This is why Im continuing to ask for a realistic example that demonstrates the concrete problem, rather than discussing problems in an abstract way. The confusion and conflation of build dependencies that are used by bzlmod modules vs application dependencies that become part of a build output is often a point of confusion. The way that build dependencies that happen to use python as part of their extensions, is that they should be isolated as @aignas points out in #2094 It often gets compared to golang dependencies, but thats really just because golang and bzlmod happen to use the same dependency resolution algorithms. Java and Python application dependency resolvers do differ from golang and indeed there isn't a reason for all languages to adopt the golang approach. |
I agree but would argue that the final consumer is in the best position to decide as it has all information available.
While it is true that the Bazel and the pip dependency solvers use slightly different mechanics, they are compatible enough in most cases. While not being a perfect fit, I see more problems solved when generating one Bazel repo per dependency as opposed to one Bazel repo per dependency closure.
One of the main reasons to use Bazel in the first place is to support big (mono-)repos. Otherwise, the overhead is usually not worth it. So I would argue that most users of
This is not true, see for example the golang example. People are usually using Bazel for a reason and explicitly chose to follow Bazels way of managing dependencies but then need some way to add dependencies from other ecosystems. So they should merge into a single dependency closure in order to safely interoperate.
I would argue that any target should completely list its dependencies. Requiring addition of transitive packages into an unrelated consumers is an anti-pattern.
This would require the root workspace to know about all the dependencies to manually pick a specific version (similar to the
This problem comes up every other week in our big monorepo and usually requires days to figure out the right order of dependencies to work around if possible at all.
One can argue at length on what people should be doing but I guess we have to face reality and see how users are actually using it. The more closely we align to the way Bazel handles things, the more intuitive it is for users and the more likely it is that users will use the library the way it is intended.
I think it doesn't bring value to discuss the advantages and disadvantages of various dependency resolvers here. But maybe the Bazel one is good enough for most use cases and the few exceptions could be covered by override mechanisms. Why not having both? |
Just to add my personal take here as I have worked a lot on 3rd party dependency handling.
As @groodt mentioned, I think having a small reproducible example which highlights the short comings of EDIT: there is also work by aspect in |
🚀 feature request
Relevant Rules
pip.parse()
module extensionDescription
When the same pip package is (transitively) depended upon multiple times from different
requirements.txt
files, it is undefined which version will be chosen (both are added to the PYTHONPATH and the order of deps decides which one takes precedence).This is especially problematic for transitive dependencies of module dependencies where one only has limited to no influence of which version is chosen. This is especially problematic for fundamental libraries like Protobuf which gets pulled in transitively in many
requirements.txt
.The way this is handled in golang, is that every dependency is its own Bazel repo and as such takes part in version resolution. This makes sure that every package only exists once within a given Bazel workspace.
Describe the solution you'd like
Generate one Bazel repo per pip dependency.
Describe alternatives you've considered
For the transition period,
rules_python
could generate both, one repo for all ofrequirements.txt
and one repo per dependency. This way, users can easily migrate from one approach to another. In a future major version, the old functionality could be removed.The text was updated successfully, but these errors were encountered: