-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch pants distribution model from "python embeds rust" to "rust embeds python" #7369
Comments
I think this is a really good idea, and follows somewhat the model where we provide a compiler and linker toolchain that works on any supported platform, instead of trying to write code which can conform to every possible user's compiler and linker toolchain. This also means we can get extremely funky with the python interpreter we ship (pypy!!!!) as long as we don't break existing pants plugins. |
More relevant background https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/ (at https://github.com/facebookincubator/xar/ on github). The squashfs part seems like it might not be relevant, but it is one model for interfacing with heterogenous resources across multiple source languages. |
This comment has been minimized.
This comment has been minimized.
https://github.com/indygreg/PyOxidizer is also a good bit of prior art that I'd forgotten to mention before. |
This might be a fun hackweek project! |
I can't tell whether it is impressive or terrifying that PyOxidizer seems to have completely wrapped away the build process here: https://pyoxidizer.readthedocs.io/en/latest/getting_started.html Depending how well it works, it has the potential to drop a lot of boilerplate... but it might also really obfuscate things. |
### Problem A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version. The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5. It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`). When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter! ### Solution - Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions. - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for. ### Result .ipex files are easier to deploy!!! ### Possible Future Work In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
### Problem A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version. The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5. It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`). When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter! ### Solution - Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions. - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for. ### Result .ipex files are easier to deploy!!! ### Possible Future Work In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
Less frameworky alternative to pyoxidize appears to be the |
A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version. The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5. It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`). When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter! - Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions. - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for. .ipex files are easier to deploy!!! In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
### Problem When initially porting the engine to rust, we used `cffi` because it had a mode that avoided the need to actually link against Python. But somewhere along the way, we began linking against Python, and needed to add C shims to expose a proper module. Though complex, it has served us well for a few years. But during that same time, Rust python-integration crates like `cpython` have matured, providing significant safety and usability benefits over a raw C API. ### Solution Port to the `cpython` crate. When compared to usage of CFFI, this eliminates the passing of raw pointers back and forth across the ffi boundary (improving errors), and allows for direct access to Python objects from Rust (while the GIL is acquired). The `cpython` crate was chosen over PyO3 because it doesn't require a nightly Rust compiler, and is [vouched for](#7369 (comment)) by the mercurial developers. The `cpython` crate does not use PEP-384, and updating it to do so would be a significant undertaking: for example, the `PyObject::call` interface relies on `PyTuple`, which is not present under PEP-384. It would likely also erase some of the performance and usability benefit of this API. So in order to switch to this crate, @Eric-Arellano did a ton of pre-work (culminating in #9881) to switch away from publishing ABI3 wheels, and toward publishing a wheel-per-python3. ### Result `./pants list ::` is 8-10% faster, usage is easier and safer, and there is less code to maintain. Additionally, since [pyoxidizer](https://pyoxidizer.readthedocs.io/en/latest/index.html) (see #7369) relies on the `cpython` crate, this helps to unblock potentially switching to distributing Pants as a native binary with an embedded Python.
Maintainer of PyOxidizer here. I stumbled across this issue and thought I'd leave a comment. I just wanted to drop links to the relatively new PyOxidizer documentation about using PyOxidizer as a mechanism to maintaining Rust projects which embed Python. https://pyoxidizer.readthedocs.io/en/latest/rust.html, https://pyoxidizer.readthedocs.io/en/latest/rust_rust_code.html, and https://pyoxidizer.readthedocs.io/en/latest/rust_porting.html are the most relevant links. While it is certainly possible to use the Please make noise on PyOxidizer's GitHub project if you want more features from the lower-level Rust crates. Finally, there are many improvements to these crates in the yet-to-be-released 0.8 release. So be sure to evaluate the |
@indygreg : Due to at least half serendipitous need (and some other fraction due to your helpful comment the other day!) I'm taking a look at this today. Seem to be making progress so far: thanks again! |
@indygreg : I ran out of steam for the day, but I'm currently figuring out how to actually include loose sources in the But adding sources results in:
EDIT: Backtracking a bit, this looks related to the
EDIT2: Ok, trying with a much simpler module with no dependencies seems to successfully build and run! Huzzah. So it seems like EDIT3: The above "in-memory extension module importing not supported by this configuration" error appears to have been caused by a |
Try switching the Also, I'm not at all surprised that I also recommend using the Also, I just overhauled the documentation on available methods for collecting Python resources. The new docs live at https://pyoxidizer.readthedocs.io/en/latest/packaging_python_files.html. The docs now include some warnings regarding caveats which you ran into! |
re:
Ok, thanks. After removing the stray But the next issue is that I now need to actually import some 3rdparty/binary dependencies, and so am trying out the The requirements file is: https://github.com/pantsbuild/pants/blob/master/3rdparty/python/requirements.txt ... in this case, is the idea that some of these 3rdparty distributions are only available as binary? It would be handy if this error included which
Great, thanks: will read those now! |
indygreg/PyOxidizer#272 exposes that (at least?) I'll try a non-in-memory resources policy. |
@indygreg : Made some more progress today: thanks again for your involvement! The current status in https://github.com/pantsbuild/pants/commits/0e742635489cb846c6edee22a174a7fdba09eeb5 is that all of our source code appears to be successfully included! I can open a repl and import various simple modules. Some more complicated modules fail to load: see below. The next two issues I'm looking at are:
It seems like both of the above are likely addressed in https://pyoxidizer.readthedocs.io/en/latest/packaging_pitfalls.html , so I'll be studying that next. |
The comments reflecting your mental stream as you debug this are very helpful! I'm already realizing there are some gaps in PyOxidizer's documentation! Please keep it up! |
Aha... https://pyoxidizer.readthedocs.io/en/latest/packaging_resources.html#python-resource-locations appears to have been the missing link here (literally? could maybe be linked-from/embedded-in https://pyoxidizer.readthedocs.io/en/latest/config_api.html#config-python-resources-policy ?) Reading through this, it occurs to me that a better factoring of indygreg/PyOxidizer#272 might be to have all of the "bulk adding of resources" APIs identify all incompatible arguments and summarize them in an error message. Currently looking into how to align the |
…_rules` (#16357) While working on #7369, I discovered that `inspect.getmodule` requires relevant modules to exist on the filesystem. PyOxidizer's importer imports from memory, so finding rules by `inspect.getmodule` will not work. This change uses the `f_globals` attribute on the callling frame to get the global objects of the frame that invokes `collect_rules`. I believe this is functionally equivalent. [ci skip-rust] [ci skip-build-wheels]
…16379) Pants uses `pkgutil.get_data()` in all manner of places to load resource files that are included within the Pants Python package. `get_data()` is not supported in the PEP302 pluggable importer used by PyOxidizer. `importlib.resources`, however, _is_ supported. `importlib.resources` (as far as PyOxidizer is concerned, at the very least) has some caveats: * Resources can only be loaded from packages that contain an `__init__.py` file (i.e. are a non-namespace package) * Resources cannot be `.py` files This adds a shim function called `read_resource()` that allows reading resource files with more-or-less the same API as `pkgutil.get_data()` (in particular, allowing `read_resource(__name__, …)`, which is used commonly), but plugs into `importlib.resources.read_binary()` to allow for better portability. A symlink to `src/python/pants/VERSION` is added at `src/python/pants/_version/VERSION`, so that the resource can be loaded with the new API, without immediately breaking the hard-coded version file location required by our setup script. Finally, the python dependency parser has been renamed to `dependency_parser_py`, so that it can be loaded as a resource, and still treated as a `python_source` for Pants linting purposes. Addresses #7369.
…k 2) (#16485) Pants uses `pkgutil.get_data()` in all manner of places to load resource files that are included within the Pants Python package. `get_data()` is not supported in the PEP302 pluggable importer used by PyOxidizer. [`importlib-resources`'s backport](https://pypi.org/project/importlib-resources/), however, _is_ supported. `importlib-resources` (as far as PyOxidizer is concerned, at the very least) has some caveats: * Resources can only be loaded from packages that contain an `__init__.py` file or namespace packages that do not contain `__init__.py` * Resources cannot be `.py` files This adds a shim function called `read_resource()` that allows reading resource files with more-or-less the same API as `pkgutil.get_data()` (in particular, allowing `read_resource(__name__, …)`, which is used commonly), but plugs into `importlib_resources.read_binary()` to allow for better portability. Finally, the python dependency parser has been renamed to `dependency_parser_py`, so that it can be loaded as a resource, and still treated as a `python_source` for Pants linting purposes. Addresses #7369. [ci skip-build-wheels]
@chrisjrn: Thanks for the progress so far! Really exciting. While discussing this with @benjyw today, we realized that the change in distribution model from wheels to a binary also represents an opportunity to finish shipping the pure native client from #11922. The only additional requirement (afaict) would be for this ticket to ship a zip/tar/etc to Github rather than shipping only the binary. It would then be a fairly minor change to ship both the pyoxidized |
…standalone binary distribution for Pants (#16484) This introduces a `pyoxidizer_binary` target for Pants itself, allowing Pants to be built into a binary executable for distribution. Currently the distribution is dynamically linked alongside the native engine, and a few other dependencies; I have not yet investigated static linking. When we investigate distribution of the oxidized binary, we'll need to see if we can make the pants bootstrap script use these binaries and still have them execute with the various libraries dynamically linked. Addresses #7369
…_rules` (pantsbuild#16357) While working on pantsbuild#7369, I discovered that `inspect.getmodule` requires relevant modules to exist on the filesystem. PyOxidizer's importer imports from memory, so finding rules by `inspect.getmodule` will not work. This change uses the `f_globals` attribute on the callling frame to get the global objects of the frame that invokes `collect_rules`. I believe this is functionally equivalent. [ci skip-rust] [ci skip-build-wheels]
…antsbuild#16379) Pants uses `pkgutil.get_data()` in all manner of places to load resource files that are included within the Pants Python package. `get_data()` is not supported in the PEP302 pluggable importer used by PyOxidizer. `importlib.resources`, however, _is_ supported. `importlib.resources` (as far as PyOxidizer is concerned, at the very least) has some caveats: * Resources can only be loaded from packages that contain an `__init__.py` file (i.e. are a non-namespace package) * Resources cannot be `.py` files This adds a shim function called `read_resource()` that allows reading resource files with more-or-less the same API as `pkgutil.get_data()` (in particular, allowing `read_resource(__name__, …)`, which is used commonly), but plugs into `importlib.resources.read_binary()` to allow for better portability. A symlink to `src/python/pants/VERSION` is added at `src/python/pants/_version/VERSION`, so that the resource can be loaded with the new API, without immediately breaking the hard-coded version file location required by our setup script. Finally, the python dependency parser has been renamed to `dependency_parser_py`, so that it can be loaded as a resource, and still treated as a `python_source` for Pants linting purposes. Addresses pantsbuild#7369.
…k 2) (pantsbuild#16485) Pants uses `pkgutil.get_data()` in all manner of places to load resource files that are included within the Pants Python package. `get_data()` is not supported in the PEP302 pluggable importer used by PyOxidizer. [`importlib-resources`'s backport](https://pypi.org/project/importlib-resources/), however, _is_ supported. `importlib-resources` (as far as PyOxidizer is concerned, at the very least) has some caveats: * Resources can only be loaded from packages that contain an `__init__.py` file or namespace packages that do not contain `__init__.py` * Resources cannot be `.py` files This adds a shim function called `read_resource()` that allows reading resource files with more-or-less the same API as `pkgutil.get_data()` (in particular, allowing `read_resource(__name__, …)`, which is used commonly), but plugs into `importlib_resources.read_binary()` to allow for better portability. Finally, the python dependency parser has been renamed to `dependency_parser_py`, so that it can be loaded as a resource, and still treated as a `python_source` for Pants linting purposes. Addresses pantsbuild#7369. [ci skip-build-wheels]
…standalone binary distribution for Pants (pantsbuild#16484) This introduces a `pyoxidizer_binary` target for Pants itself, allowing Pants to be built into a binary executable for distribution. Currently the distribution is dynamically linked alongside the native engine, and a few other dependencies; I have not yet investigated static linking. When we investigate distribution of the oxidized binary, we'll need to see if we can make the pants bootstrap script use these binaries and still have them execute with the various libraries dynamically linked. Addresses pantsbuild#7369
Is there hope that this will be a thing soon? |
There is! The underlying infrastructure is being worked on in https://github.com/a-scie (for now, that might move) Then it's a matter of applying that to Pants, which would not be hard. I was able to get that working hackily in about 20 minutes, while ignoring some finer details of course... |
@Kaelten Would this solve a problem you're currently encountering? |
I think it helps make pants more accessible overall. I'm still in a bit of a limbo stuck between loving the promise of pants and figuring out how to achieve that dream. |
Epic ticket close! |
The ticket title though is telling. This was implemented in a totally different way with no embedding at all in the sense laid out. Beware describing the solution and not just the problem. |
tl;dr: See the title.
What this would mean would be: rather than shipping wheels (or even pexes, as in #12397), we would ship static rust executables the same way that we currently ship pexes for releases. To do that, we would move away from:
and toward
Implementation wise, this might look like Mercurial's approach (although there is probably more-modern prior-art): https://www.mercurial-scm.org/wiki/OxidationPlan
An advantage to this would be removing our dependence on a python interpreter on the user's machine, and improving our path forward for low-latency startup.
The text was updated successfully, but these errors were encountered: