From 8b47a5d88fb164b862336f88f3fe895dc9adbe4a Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 5 Dec 2024 11:04:59 -0700 Subject: [PATCH] Expand docs on when and why allow_threads is necessary --- guide/src/free-threading.md | 42 ++++++++++++++++++++++------- guide/src/parallelism.md | 54 ++++++++++++++++++++++++++++++++++++- 2 files changed, 85 insertions(+), 11 deletions(-) diff --git a/guide/src/free-threading.md b/guide/src/free-threading.md index f212cb0b9a9..95ada1aed56 100644 --- a/guide/src/free-threading.md +++ b/guide/src/free-threading.md @@ -160,16 +160,38 @@ The main reason for obtaining a `'py` lifetime is to interact with Python objects or call into the CPython C API. If you are not yet attached to the Python runtime, you can register a thread using the [`Python::with_gil`] function. Threads created via the Python [`threading`] module do not not need to -do this, but all other OS threads that interact with the Python runtime must -explicitly attach using `with_gil` and obtain a `'py` liftime. - -Since there is no GIL in the free-threaded build, releasing the GIL for -long-running tasks is no longer necessary to ensure other threads run, but you -should still detach from the interpreter runtime using [`Python::allow_threads`] -when doing long-running tasks that do not require the CPython runtime. The -garbage collector can only run if all threads are detached from the runtime (in -a stop-the-world state), so detaching from the runtime allows freeing unused -memory. +do this, and pyo3 will handle setting up the [`Python<'py>`] token when CPython +calls into your extension, but all other OS threads that interact with the +Python runtime must explicitly attach using `with_gil` and obtain a `'py` +liftime. + +### Global synchronization events can cause hangs and deadlocks + +The free-threaded build triggers global synchronization events in the following +situations: + +* During garbage collection in order to get a globally consistent view of + reference counts and references between objects +* In Python 3.13, when the first background thread is started in + order to mark certain objects as immortal +* When either `sys.settrace` or `sys.setprofile` are called in order to + instrument running code objects and threads +* Before `os.fork()` is called. + +This is a non-exhaustive list and there may be other situations in future Python +versions that can trigger global synchronization events. + +This means that you should detach from the interpreter runtime using +[`Python::allow_threads`] in exactly the same situations as you should detach +from the runtime in the GIL-enabled build: when doing long-running tasks that do +not require the CPython runtime or when doing any task that needs to re-attach +to the runtime (see the [guide +section](guide/parallelism.md#sharing-python-objects-between-rust-threads) that +covers this). In the former case, you would observe a hang on threads that are +waiting on the long-running task to complete, and in the latter case you would +see a deadlock while a thread tries to attach after the runtime triggers a +global synchronization event, but the spawning thread prevents the +synchronization event from completing. ### Exceptions and panics for multithreaded access of mutable `pyclass` instances diff --git a/guide/src/parallelism.md b/guide/src/parallelism.md index a288b14be19..eef396afa70 100644 --- a/guide/src/parallelism.md +++ b/guide/src/parallelism.md @@ -1,6 +1,6 @@ # Parallelism -CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. +CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that. In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel. ```rust,no_run @@ -117,4 +117,56 @@ test_word_count_python_sequential 27.3985 (15.82) 45.452 You can see that the Python threaded version is not much slower than the Rust sequential version, which means compared to an execution on a single CPU core the speed has doubled. +## Sharing Python objects between Rust threads + +In the example above we made a Python interface to a low-level rust function, +and then leveraged the python `threading` module to run the low-level function +in parallel. It is also possible to spawn threads in Rust that acquire the GIL +and operate on Python objects. However, care must be taken to avoid writing code +that deadlocks with the GIL in these cases. + +In the example below, we share a `vec` of User ID objects defined using the +`pyclass` macro and spawn threads to process the collection of data into a `vec` +of booleans based on a predicate using a rayon parallel iterator: + +```rust,no_run +use pyo3::prelude::*; + +// These traits let us use int_par_iter and map +use rayon::iter::{IntoParallelIterator, ParallelIterator}; + +#[pyclass] +struct UserID { + id: i64, +} + +let instances: Vec> = Python::with_gil(|py| { + (0..10).map(|x| Py::new(py, UserID { id: x }).unwrap()).collect() +}); +let allowed_ids: Vec = Python::with_gil(|outer_py| { + outer_py.allow_threads(|| { + (0..instances.len()).into_par_iter().map(|index| { + Python::with_gil(|inner_py| { + instances[index].borrow(inner_py).id > 5 + }) + }).collect() + }) +}); +assert!(allowed_ids.into_iter().filter(|b| *b).count() == 4); +``` + +It's important to note that there is an `outer_py` GIL lifetime token as well as +an `inner_py` token. Sharing GIL lifetime tokens between threads is not allowed +and threads must individually acquire the GIL to access data wrapped by a python +object. + +It's also important to see that this example uses [`Python::allow_threads`] to +wrap the code that spawns OS threads via `rayon`. If this example didn't use +`allow_threads`, a rayon worker thread would block on acquiring the GIL while a +thread that owns the GIL spins forever waiting for the result of the rayon +thread. Calling `allow_threads` allows the GIL to be released in the thread +collecting the results from the worker threads. You should always call +`allow_threads` in situations that spawn worker threads, but especially so in +cases where worker threads need to acquire the GIL to prevent deadlocks. + [`Python::allow_threads`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.allow_threads