Skip to content

Commit

Permalink
Expand docs on when and why allow_threads is necessary
Browse files Browse the repository at this point in the history
  • Loading branch information
ngoldbaum committed Dec 5, 2024
1 parent 77202aa commit 8b47a5d
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 11 deletions.
42 changes: 32 additions & 10 deletions guide/src/free-threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,16 +160,38 @@ The main reason for obtaining a `'py` lifetime is to interact with Python
objects or call into the CPython C API. If you are not yet attached to the
Python runtime, you can register a thread using the [`Python::with_gil`]
function. Threads created via the Python [`threading`] module do not not need to
do this, but all other OS threads that interact with the Python runtime must
explicitly attach using `with_gil` and obtain a `'py` liftime.

Since there is no GIL in the free-threaded build, releasing the GIL for
long-running tasks is no longer necessary to ensure other threads run, but you
should still detach from the interpreter runtime using [`Python::allow_threads`]
when doing long-running tasks that do not require the CPython runtime. The
garbage collector can only run if all threads are detached from the runtime (in
a stop-the-world state), so detaching from the runtime allows freeing unused
memory.
do this, and pyo3 will handle setting up the [`Python<'py>`] token when CPython
calls into your extension, but all other OS threads that interact with the
Python runtime must explicitly attach using `with_gil` and obtain a `'py`
liftime.

### Global synchronization events can cause hangs and deadlocks

The free-threaded build triggers global synchronization events in the following
situations:

* During garbage collection in order to get a globally consistent view of
reference counts and references between objects
* In Python 3.13, when the first background thread is started in
order to mark certain objects as immortal
* When either `sys.settrace` or `sys.setprofile` are called in order to
instrument running code objects and threads
* Before `os.fork()` is called.

This is a non-exhaustive list and there may be other situations in future Python
versions that can trigger global synchronization events.

This means that you should detach from the interpreter runtime using
[`Python::allow_threads`] in exactly the same situations as you should detach
from the runtime in the GIL-enabled build: when doing long-running tasks that do
not require the CPython runtime or when doing any task that needs to re-attach
to the runtime (see the [guide
section](guide/parallelism.md#sharing-python-objects-between-rust-threads) that
covers this). In the former case, you would observe a hang on threads that are
waiting on the long-running task to complete, and in the latter case you would
see a deadlock while a thread tries to attach after the runtime triggers a
global synchronization event, but the spawning thread prevents the
synchronization event from completing.

### Exceptions and panics for multithreaded access of mutable `pyclass` instances

Expand Down
54 changes: 53 additions & 1 deletion guide/src/parallelism.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Parallelism

CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing.
CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that.

In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel.
```rust,no_run
Expand Down Expand Up @@ -117,4 +117,56 @@ test_word_count_python_sequential 27.3985 (15.82) 45.452

You can see that the Python threaded version is not much slower than the Rust sequential version, which means compared to an execution on a single CPU core the speed has doubled.

## Sharing Python objects between Rust threads

In the example above we made a Python interface to a low-level rust function,
and then leveraged the python `threading` module to run the low-level function
in parallel. It is also possible to spawn threads in Rust that acquire the GIL
and operate on Python objects. However, care must be taken to avoid writing code
that deadlocks with the GIL in these cases.

In the example below, we share a `vec` of User ID objects defined using the
`pyclass` macro and spawn threads to process the collection of data into a `vec`
of booleans based on a predicate using a rayon parallel iterator:

```rust,no_run
use pyo3::prelude::*;
// These traits let us use int_par_iter and map
use rayon::iter::{IntoParallelIterator, ParallelIterator};
#[pyclass]
struct UserID {
id: i64,
}
let instances: Vec<Py<UserID>> = Python::with_gil(|py| {
(0..10).map(|x| Py::new(py, UserID { id: x }).unwrap()).collect()
});
let allowed_ids: Vec<bool> = Python::with_gil(|outer_py| {
outer_py.allow_threads(|| {
(0..instances.len()).into_par_iter().map(|index| {
Python::with_gil(|inner_py| {
instances[index].borrow(inner_py).id > 5
})
}).collect()
})
});
assert!(allowed_ids.into_iter().filter(|b| *b).count() == 4);
```

It's important to note that there is an `outer_py` GIL lifetime token as well as
an `inner_py` token. Sharing GIL lifetime tokens between threads is not allowed
and threads must individually acquire the GIL to access data wrapped by a python
object.

It's also important to see that this example uses [`Python::allow_threads`] to
wrap the code that spawns OS threads via `rayon`. If this example didn't use
`allow_threads`, a rayon worker thread would block on acquiring the GIL while a
thread that owns the GIL spins forever waiting for the result of the rayon
thread. Calling `allow_threads` allows the GIL to be released in the thread
collecting the results from the worker threads. You should always call
`allow_threads` in situations that spawn worker threads, but especially so in
cases where worker threads need to acquire the GIL to prevent deadlocks.

[`Python::allow_threads`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.allow_threads

0 comments on commit 8b47a5d

Please sign in to comment.