Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a hermetic python toolchain #19

Open
arlyon opened this issue Oct 5, 2022 · 10 comments
Open

Providing a hermetic python toolchain #19

arlyon opened this issue Oct 5, 2022 · 10 comments

Comments

@arlyon
Copy link
Contributor

arlyon commented Oct 5, 2022

For serious use-cases we would benefit from a system-independent python toolchain that can source an interpreter. At the basic level this involves downloading a copy of CPython for the current platform that can run code. An issue with this is that basic cpython depends on dynamic libraries such as libssl and libsqlite, so we need to either provide a mechanism for building those consistently (c / cpp compiler) or use an interpreter with static linking such as https://python-build-standalone.readthedocs.io

Potential learnings from bazel

  • dependencies are managed in a repo rule, meaning they are not using the same toolchain that bazel uses, but use the local toolchain
  • cross compilation is tough (broken)
@danmx
Copy link

danmx commented Dec 4, 2022

Here's what Pants is trying to do pantsbuild/pants#7369

@LegNeato
Copy link

LegNeato commented Dec 8, 2022

@LegNeato
Copy link

LegNeato commented Dec 8, 2022

Ah, I see it is mentioned in the pants issue

@arlyon
Copy link
Contributor Author

arlyon commented Dec 20, 2022

Hi all, we are releasing a hermetic toolchain based on indygreg/python-build-standalone this week. (also hi @LegNeato, I submitted a few patches for juniper a couple of years back :) )

@benmkw
Copy link

benmkw commented Apr 22, 2023

I'm currently looking for a way to bundle a static python build with a rust program. The closest I've found is https://github.com/python-cmake-buildsystem/python-cmake-buildsystem (although that goes only up to 3.6) which replaces the configure logic of cpython with cmake scripts, the same would presumably be needed for buck2 as well so that one could have a rust/cpp program build by buck2 and then statically link against cpython. I've found that this conceptually easy task is really not that well supported yet anywhere (see https://pyo3.rs/v0.14.0/building_and_distribution#statically-embedding-the-python-interpreter as well).

This is helpful if one needs/ wants python, but does not rely on much of its ecosystem/ packages and just wants a scripting language which is known by many developers for some parts of a system.

(python-build-standalone seems to be a wrapper around the configure scripts of the core cpython distribution + relies on docker which is a step I'd like to avoid)

@benbrittain
Copy link
Contributor

benbrittain commented Nov 8, 2023

This is a little bit hacky, but I was asked to share how I fixed this problem:

(please ignore the fact this is python3.8 😓 )

BUCK

http_archive(                                                                                                            
   name = "python-standalone-archive",                                                                                  
   # TODO self host this                                                                                                
   urls = [ "https://github.com/indygreg/python-build-standalone/releases/download/20231002/cpython-3.8.18+20231002-x86_64-unknown-linux-gnu-pgo-full.tar.zst"],                                                                                 
   sha256 = "3209542fbcaf7c3ef5658b344ea357c4aabf5fe7cbf1b5dea4a0b78b64835fc0",                                         
   visibility = ["PUBLIC"],                                                                                             
)                                                                                                                        
                                                                                                                        
standalone_python(                                                                                                       
   name = "python-standalone",                                                                                          
   archive = ":python-standalone-archive",                                                                              
   visibility = ["PUBLIC"]                                                                                              
)                                                                                                                        
                                                                                                                        
prebuilt_cxx_library(                                                                                                    
   name = "python-headers",                                                                                             
   header_dirs = [ "@toolchains//python:python-standalone[includes]"],                                                  
   visibility = ["PUBLIC"],                                                                                             
)  

defs.bzl

                                                                                                
def _standalone_python_impl(ctx: AnalysisContext) -> list[Provider]:                                                     
 # generate a runnable python3 binary                                                                                 
 python = ctx.actions.declare_output("__python", dir = True)                                                          
 ctx.actions.copy_dir(python, ctx.attrs.archive)                                                                      
 interpreter = cmd_args(python, format = "{}/python/install/bin/python3").hidden(python)                              
                                                                                                                      
 # provide relavant headers for pybind                                                                                
 includes = ctx.actions.declare_output("include", dir = True)                                                         
 ctx.actions.copy_file(includes, python.project("python/install/include/python3.8"))                                  
                                                                                                                      
 return [                                                                                                             
     DefaultInfo(sub_targets = {                                                                                      
         "interpreter": [RunInfo(interpreter)],                                                                       
         "includes": [DefaultInfo(includes)],                                                                         
     })                                                                                                               
 ]                                                                                                                    
                                                                                                                      
standalone_python = rule(                                                                                                
 impl = _standalone_python_impl,                                                                                      
 attrs = {                                                                                                            
     "archive": attrs.source(),                                                                                       
 }                                                                                                                    
)                                                                                                                        


toolchain//BUCK

system_python_toolchain(                                                                                             
    name = "python",                                                                                                     
    interpreter = "toolchains//python:python-standalone[interpreter]",                                                   
    visibility = ["PUBLIC"],                                                                                             
)           

facebook-github-bot pushed a commit that referenced this issue Feb 15, 2024
Summary:
The stack around D42099161 introduced the ability to dynamically set the log level via a debug-command. This requires using `tracing_subscriber::reload::Layer`. The implementation of that machinery is backed by a `RwLock` [here](https://fburl.com/code/4xv0ihpn). This `RwLock` is read locked once on every single `#[instrumentation]` call to check whether the instrumentation is active or not.

On top of that, something in our fbsource third-party config enables the `parking_lot` feature of `tracing_subscriber`. This means that this is a `parking_lot::RwLock`, not a `std::sync::RwLock`, which is bad because it's well known that parking lot has problems with excessive spinning.

What all that adds up to is that when you put `#[instrument]` on a super hot function in dice, you get dozens of threads that are all simultaneously doing this:

```
  thread #23, name = 'buck2-rt', stop reason = signal SIGSTOP
    frame #0: 0x000000000c94a1fa buck2`<parking_lot::raw_rwlock::RawRwLock>::lock_shared_slow + 122
    frame #1: 0x0000000007a80b54 buck2`<tracing_subscriber::layer::layered::Layered<tracing_subscriber::reload::Layer<tracing_subscriber::filter::layer_filters::Filtered<tracing_subscriber::fmt::fmt_layer::Layer<tracing_subscriber::registry::sharded::Registry, tracing_subscriber::fmt::format::DefaultFields, tracing_subscriber::fmt::format::Format, std::io::stdio::stderr>, tracing_subscriber::filter::env::EnvFilter, tracing_subscriber::registry::sharded::Registry>, tracing_subscriber::registry::sharded::Registry>, tracing_subscriber::registry::sharded::Registry> as tracing_core::subscriber::Subscriber>::enabled + 292
    frame #2: 0x000000000a172d77 buck2`<dice::legacy::incremental::dep_trackers::internals::Dep<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>> as dice::legacy::incremental::graph::dependencies::Dependency>::recompute::{closure#0} + 183
    frame #3: 0x000000000a03606c buck2`<futures_util::stream::futures_unordered::FuturesUnordered<core::pin::Pin<alloc::boxed::Box<dyn core::future::future::Future<Output = core::result::Result<(alloc::boxed::Box<dyn dice::legacy::incremental::graph::dependencies::ComputedDependency>, alloc::sync::Arc<dyn dice::legacy::incremental::graph::GraphNodeDyn>), dice::api::error::DiceError>> + core::marker::Send>>> as futures_util::stream::stream::StreamExt>::poll_next_unpin + 444
    frame #4: 0x0000000009fc4755 buck2`<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::compute_whether_dependencies_changed::{closure#0} (.llvm.4229350879637282184) + 1733
    frame #5: 0x0000000009ef30d4 buck2`<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::compute_whether_versioned_dependencies_changed::{closure#0}::{closure#0} + 228
    frame #6: 0x0000000009f194ec buck2`<buck2_futures::cancellable_future::CancellableFuture<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}> as core::future::future::Future>::poll (.llvm.11181184606289051452) + 3500
    frame #7: 0x0000000009f04bbf buck2`<futures_util::future::future::map::Map<buck2_futures::cancellable_future::CancellableFuture<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#0}> as core::future::future::Future>::poll + 31
    frame #8: 0x0000000009ff0339 buck2`<futures_util::future::future::flatten::Flatten<futures_util::future::future::Map<futures_util::future::ready::Ready<()>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#1}>, <futures_util::future::future::Map<futures_util::future::ready::Ready<()>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#1}> as core::future::future::Future>::Output> as core::future::future::Future>::poll + 201
    frame #9: 0x00000000093f9f22 buck2`<tokio::task::task_local::TaskLocalFuture<buck2_events::dispatch::EventDispatcher, core::pin::Pin<alloc::boxed::Box<dyn core::future::future::Future<Output = alloc::boxed::Box<dyn core::any::Any + core::marker::Send>> + core::marker::Send>>> as core::future::future::Future>::poll + 130
    frame #10: 0x000000000939fdcb buck2`<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}> as core::future::future::Future>::poll + 139
    frame #11: 0x00000000091ca5a9 buck2`<tokio::runtime::task::core::Core<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle>>>::poll + 169
    frame #12: 0x00000000091c1b22 buck2`<tokio::runtime::task::harness::Harness<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle>>>::poll + 146
    frame #13: 0x000000000c920f6f buck2`<tokio::runtime::scheduler::multi_thread::worker::Context>::run_task + 895
    frame #14: 0x000000000c91f847 buck2`<tokio::runtime::scheduler::multi_thread::worker::Context>::run + 2103
    frame #15: 0x000000000c932264 buck2`<tokio::runtime::context::scoped::Scoped<tokio::runtime::scheduler::Context>>::set::<tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}, ()> + 52
    frame #16: 0x000000000c932169 buck2`tokio::runtime::context::runtime::enter_runtime::<tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}, ()> + 441
    frame #17: 0x000000000c91efa6 buck2`tokio::runtime::scheduler::multi_thread::worker::run + 70
    frame #18: 0x000000000c906a50 buck2`<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>> as core::future::future::Future>::poll + 160
    frame #19: 0x000000000c8f8af9 buck2`<tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>>>>::with_mut::<core::task::poll::Poll<()>, <tokio::runtime::task::core::Core<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll::{closure#0}> + 153
    frame #20: 0x000000000c90166b buck2`<tokio::runtime::task::core::Core<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll + 43
    frame #21: 0x000000000c90d9b8 buck2`<tokio::runtime::task::harness::Harness<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll + 152
    frame #22: 0x000000000c90b848 buck2`<tokio::runtime::blocking::pool::Inner>::run + 216
    frame #23: 0x000000000c9002ab buck2`std::sys_common::backtrace::__rust_begin_short_backtrace::<<tokio::runtime::blocking::pool::Spawner>::spawn_thread::{closure#0}, ()> + 187
    frame #24: 0x000000000c90042e buck2`<<std::thread::Builder>::spawn_unchecked_<<tokio::runtime::blocking::pool::Spawner>::spawn_thread::{closure#0}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} + 158
    frame #25: 0x000000000757e4b5 buck2`std::sys::unix::thread::Thread::new::thread_start::h618b0b8e7bda0b2b + 37
    frame #26: 0x00007f2ba1e9abaf
```

This hit an open source user super hard over in #555 .

This diff approaches this issue by putting all the `#[instrument]`s in dice behind `#[cfg(debug_assertions)]`. This allows them to still be used in debug mode, but keeps them out of any hot paths. I do not have numbers to show that most of these matter. The only one I know matters for sure is `recompute`.

Alternatives are to either try and tailor this to the callsites we know are hot (that sounds error prone) or to revert the stack that inadvertently introduced the RwLock. Even if we did that though, it's probably still safer to keep `#[instrument]` out of super hot paths like this.

Reviewed By: stepancheg

Differential Revision: D53744041

fbshipit-source-id: 85bce9af2fec8ad1d50adc241d3b8e2bfc5cec87
@jazzdan
Copy link
Contributor

jazzdan commented Mar 1, 2024

Hey @benbrittain, quick question about that snippet. As far as I can tell the interpreter attribute on system_python_toolchain expects a string which represents the name of the python binary, e.g. python or python3. How did you get this to work providing a RunInfo reference to the interpreter attribute? Or is this more like pseudocode?

As it stands if I run this code I get output like this:

$ buckle build //:thing-that-uses-python
Local command returned non-zero exit code <no exit code>
Reproduce locally: `env -- 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' '//tool ...<omitted>... buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
Spawning executable `//toolchains/python:python-standalone[interpreter]` failed: Failed to spawn a process
$ buckle log what-failed
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:thing-that-uses-python
build	root//:thing-that-uses-python (prelude//platforms:default#213ed1b7ab869379) (npm)	local	env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=ad2238c7-8161-47d5-b3fb-598a98a18e23' '//toolchains/python:python-standalone[interpreter]' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development

Note how it's just sticking the literal string //toolchains/python:python-standalone[interpreter] in the command.

@cbarrete
Copy link
Contributor

cbarrete commented Mar 1, 2024

I believe that you'll want to use something like "$(exe //toolchains/python:python-standalone[interpreter])" instead, which gets replaced by the location of the artifact (the interpreter in that case). exe also ensures that it has a RunInfo, and location is also available when that's not required.

@jazzdan
Copy link
Contributor

jazzdan commented Mar 1, 2024

@cbarrete yeah I tried that too but it doesn't look like that attribute supports the macros

Spawning executable `$(exe @prelude-replay//python:python-standalone[interpreter])` failed: Failed to spawn a process
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:backend-npm-install
build	root//:backend-npm-install (prelude//platforms:default#213ed1b7ab869379) (npm)	local	env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=345f1930-17d7-4b88-b3da-be18602adcf4' '$(exe @prelude-replay//python:python-standalone[interpreter])' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development

@jazzdan
Copy link
Contributor

jazzdan commented Mar 12, 2024

I ironed out some issues and got a working hermetic Python toolchain (at least on mac x86, mac arm64 and linux x86, Windows is untested). I posted it here https://github.com/jazzdan/buck2-python-toolchain-problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants