-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with free-threaded build (PEP 703) #5112
Comments
I would love to get pybind11 working with the free-threaded build now that 3.13 beta 1 is out. I have some older patches that I can update for the latest version of pybind11. The basic strategy is to lock around accesses to template <typename F>
inline auto with_internals(const F &cb) -> decltype(cb(get_internals())) {
auto &internals = get_internals();
PYBIND11_LOCK_INTERNALS(internals); // no-op in default build
return cb(internals);
}
...
with_internals([&](internals &internals) {
// access or modify internals
}); It's also helpful to have special handling for |
I'm interested in this discussion also in the context of nanobind. The idea of threads fighting over the cache line holding the internals mutex is a bit concerning though. A lot of those accesses are read-only, e.g. to fetch a type object. Is One option could be to "freeze" certain data structures after a module is loaded, after which they can only be accessed in read-only mode. |
I'd be inclined to start with
I think "expensive" depends on the scale you are talking about. For example, in CPython a mutex acquisition is expensive compared to say a To be clear, I think the mutex acquisitions should only happen when compiling for the "free-threaded" build and not for the "default" build of CPython with the GIL.
Can you point me at the relevant code? It's been a while since I looked at this. I remember the instance map being a bottleneck without special handling, but not fetching type objects.
The more things that can be made effectively immutable the better. |
In Two follow-up questions:
|
It's not currently part of the public C API and the discussions about making it public have stalled. We're considering extracting
Not yet, but we plan to write one. It will definitely happen before the 3.13 final release in October; hopefully much sooner. Right now our priority is the libraries at the "bottom of the stack" like pybind11, Cython, and NumPy. Some of the coordination work is tracked in https://github.com/Quansight-Labs/free-threaded-compatibility. I expect that the work on making these libraries thread-safe without the GIL will help make the "how to" guide better. |
That seems potentially bad if it's a central piece of infrastructure needed for efficient locking. If every extension comes up with its own half-baked locking scheme, that could turn into a bigger issue in the future. For example, debugging locking issues is probably much easier if everyone is generally using the same kind of lock. There are still some fundamental questions that are unclear to me in the absence of a developer guide. Some are probably quite naïve.
Thank you in advance! |
Actually, here is one more: in Python extensions, it's quite common to create simple types that cannot form reference cycles without the PEP 703 mentions that this now affects how reference counting works for such types. What are the practical implications, and does are non-GCed types an anti-pattern in free-threaded builds? |
Generally yes, aside for bugs and a few exceptions. I don't think pybind11 should do extra things to protect callers to the C API; if there are things we need to change, we should do it in CPython.
The public functions of, e.g.,
It's still safe to use these functions in contexts where no other thread may be modifying the object. For example, extensions often use However, the generic
Python extension modules shouldn't directly lock builtin objects like
This isn't part of the public C API yet. I'd like to make it public, but I was hoping to get There's a tradeoff between In general, I think it's better to use simple mutexes in extensions unless there's a concrete reason not to.
I think you might be referring to this section of PEP 703. I don't think there are any practical implications for extensions. Non-GC enabled objects are not an anti-pattern; you should continue to use GC or non-GC types as you do currently. For CPython internals, the implication is that code objects are now tracked by the GC in the free-threaded build. |
Thank you for your responses Sam! Another point I wanted to discuss: I think that to safely expose free-threading in a library like pybind11, or nanobind, it would be desirable to add further annotations for arguments and classes. For example, a ".lock" argument for parameters.
So that the binding layer can guarantee that there isn't concurrent access to the internals of an object. Unsafe concurrent access would in general not just cause race conditions but actually segfault the process, which would be very bad. Similarly some classes may not permit concurrent access, and the binding layer could then synchronize with
But for all of this, it would be desirable to use the locking support of the underlying |
(So it probably makes sense to wait until the PyMutex discussion has stabilized and there is an API for us to use) |
I think I understand |
You can specify default values, nullability, etc -- e.g. The |
The first, simple step I'd assume is to make sure we can compile against free-threaded mode but still hold the GIL. I've fixed the four tests fail due to refcount issues when doing so against beta 1. If I actually disable the GIL, two more from
We don't do much threading in the test suite, so I'd not expect too many failures. This seems to be an issue with registering type conversions. |
Required prerequisites
What version (or hash if on master) of pybind11 are you using?
2.12.0
Problem description
PEP 703 proposes making the GIL optional. The steering council accepted it, and its integration is ongoing within CPython. Is there a plan to make pybind11 compatible with free-threaded builds?
Some changes would probably involve using new references instead of borrowed references. Apart from this, could the global state stored in
internals
andlocal_internals
pose an issue if accessed from different threads simultaneously without the GIL to protect this?Reproducible example code
No response
Is this a regression? Put the last known working version here if it is.
Not a regression
The text was updated successfully, but these errors were encountered: