-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient way to relocate files in runfiles #23728
Comments
Have you looked into |
Thanks for filing this, Greg. I'll post some of my notes from thinking about this topic from awhile back. Sorry, this will be a bit long. Additional use case: Sphinx: It requires all the files it's going to process to be under a single directory. #15486 is sort of related, insofar as: the main reason to pass runfiles into an action is to respect the root_symlinks type of transforms that runfiles can express and that Bazel materializes into paths on disk (as seen by tools). If there was some other way to express that sort of relocation, then (presumably), that could be passed into an action instead.
Unfortunately, these don't work well for a few reasons:
For (1), it starts to break down when multiple binaries are involved with different transitive closures. For example, you might have two binaries (outer and inner) that use the same library, but different versions, and one is a data dep of the other, If a library, foo, tries to do e.g.
But the library can't know what that X/Y prefix is supposed to be. And, really, the library shouldn't care about any part of the For (2), what I mean is: Making root_symlinks work requires everything to use it. While e.g. py_library could use root_symlinks, a filegroup in its data deps won't. This means, when materialized to runfiles, the py code is in on place, and its data files are in another place. It's only option is to flatten the runfiles to try and figure things out. For (3), what I mean is: Both the library and consuming binary have to agree about the "site package prefix" used. If the library uses For (4), what I mean is: part of the precompiling feature is a binary-level attribute to use/not-use precompiled files. In order for this to work, a library can't put its py code into runfiles -- if it did, its files would always be included, and a binary could no longer opt-out. Back to OPFundamentally, what we want to do is to express some way to efficiently relocate (i.e. change their materialized runfiles path) files.
So given something like:
An ideal output looks something like:
Some various ideas and design notes I have: This can sort of be modeled using TreeArtifacts, however, this has two drawbacks:
This can sort of be modeled by having e.g.
I do something like this for rules_python's docgen. It requires flattening the depset and then declare_file/symlink for everything in the depset. It works, but it's one thing to do it for O(dozens) of files vs O(tens-of-thousands) of files. An alternative idea I had was to have something like a "relative file" object. The idea being: wherever it's put, it'll always be relative to the path it indicates. Consumers of relative files have a way to specify what they're relative to (perhaps another relative file, or some runfiles-root path or something). So you'd have something like:
..or something like that. But the basic idea is, at the library level, the (prefix, File) information is given, and then a higher level is able to easily/efficiently "move" it. All it's really doing it affecting the path that Bazel will materialize. Notably, it's not having to e.g. invoke a custom tool via a build action just to copy/paste a bunch of files to new paths. A less well-thought out idea is to allow some sort of transform function. So a binary would do e.g.
Where the xform function returns the path to materialize. I'm not sure how this would compose, though. |
I came across another use case efficient relocation would help: avoiding overlap when generated file are in the same directory as a binary. e.g. given
We get an error because we want to generate However, we don't have to put the pyc in a subdirectory. Python has a feature called an "alt pyc root", which lets us specify an entirely different location to act as the "root" of pyc searching. Thus, we could generate e.g. |
You might want to take a look at rules_js, which creates a pnpm-compatible |
Description of the feature request:
There are situations where having more granular control over the layout of files and directories in runfiles is beneficial.
Python context
Specifically, for rules_python, when we import third_party wheels (archives) from indexes like PyPI (Python Package Index), we would ideally like to expand the archives into a directory called
site-packages
. This is because outside the bazel context, packages are expanded as siblings insidesite-packages
.This creates a major impedance mismatch between bazel and archives or foreign packages brought in from other ecosystems. It truly breaks assumptions in breaking ways in Python. One example (there are many others): NVIDIA wheels are expected to be installed as siblings of site-packages and they even have relative
RPATH
inside their.so
. There are no built-in mechanisms to solve this cleanly in Python or bazel at the moment. See bazelbuild/rules_python#2156Here is a typical example of the contents of
site-packages
forpip install requests
where you can see the standard layout when not using bazel.Other contexts
This is more than just a Python problem. Node has similar challenges with
node_modules
.cc @rickeylev @fmeum
Which category does this issue belong to?
No response
What underlying problem are you trying to solve with this feature?
No response
Which operating system are you running Bazel on?
All
What is the output of
bazel info release
?N/A
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.N/A
What's the output of
git remote get-url origin; git rev-parse HEAD
?Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: