-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sub-interpreters #576
Comments
Specifically would be amazing if it was possible to create multiple separate python interpreters that could be run on different threads in parallel, but which share the same memory space (with the type system used to ensure this is only observable from rust). |
The complexity with sub-interpreters is that no Python objects should ever be shared between them. The pyO3 API isn't set up to prevent this at the moment, so it's going to take a fair whack of experimentation before we get anything stable in place. |
I'd like to have multiple thread, each one has an interpreter. No PyObject would be send between thread. |
@davidhewitt with https://peps.python.org/pep-0684/, using sub interpreters or multiple interpreters to unlock true multi-core parallelism becomes possible. Is adding support for this in pyo3 timelines or consideration? |
Found this interesting article on current usage of sub interpreters in python (no rust there) |
We are very aware of the per-intepreter parallelism landing in Python 3.12. There are significant changes which need to happen to PyO3's current implementation to support this correctly. We have been discussing some of these challenges in multiple discussions across this repo, such as #2885 which looks at the possible nogil option. There are several main issues which are prominent in my mind, although others may exist:
Solving these problems is likely to create significant churn of PyO3's API, so we can only make progress once someone has proposed a relatively complete solution which we can adopt with a suitable migration path for users. |
Hello guys, I was redirected here by @messense. I'm building a lib here, and I need to use multiple py compilers at once, so I tried to do it, at first I faced the same issue that py objects can't be shared by threads, so I tried a different approach I created a gill pool inside the thread, then I get a python and did some stuff with it processing some data. It has worked for some time until I start to notice that some threads were randomly crashing, with some Forensic analysis of what was going on I find that the problem is because when we are assuming gill as acquired it basically takes the gill of some thread that is using it, and then the "reference" was gone, this is what I made: let getting_py = unsafe { Python::assume_gil_acquired() };
let gil_pool = unsafe { getting_py.clone().new_pool() };
py = gil_pool.python(); However, when it happened I switch my lib to call callbacks using only one python compiler at once for now, what isn't optimized, but I have to make the project keeps going, however I continue to try to find a solution for this because it is something that I really need to speed up things here. So someone can please explain better to me or reference me to some trusted article to I understand better how the gill aquire and python compiler inside rust works and if it needs to keep acquired as a reference, because I can potentially have a solution in mind for this that temporarily will work if what I'm with mind really makes sense |
Ok, I have cloned the repo here, and study how it works, to start to understand how the logic is going, however I'm not 100% sure of how all this is working because it is too much code to be honest haha. Here I look something interesting: In GIL.rs have this python interpreter pool that seam to acquire a GIL pool and then release it: #[cfg(not(PyPy))]
pub unsafe fn with_embedded_python_interpreter<F, R>(f: F) -> R
where
F: for<'p> FnOnce(Python<'p>) -> R,
{
assert_eq!(
ffi::Py_IsInitialized(),
0,
"called `with_embedded_python_interpreter` but a Python interpreter is already running."
);
ffi::Py_InitializeEx(0);
// Safety: the GIL is already held because of the Py_IntializeEx call.
let pool = GILPool::new();
// Import the threading module - this ensures that it will associate this thread as the "main"
// thread, which is important to avoid an `AssertionError` at finalization.
pool.python().import("threading").unwrap();
// Execute the closure.
let result = f(pool.python());
// Drop the pool before finalizing.
drop(pool);
// Finalize the Python interpreter.
ffi::Py_Finalize();
result
} The idea here is a pool, and we be able to acquire python from here and then release. Like a pool of connections of a sqlite3, wright? So the issue seams to be that when we acquire the GIL pool it creates a conn with the interpreter and I'm assuming that at the moment we only can have one of those... I've been considering a new approach to tackle our challenges with multithreading and Python's Global Interpreter Lock (GIL) for the moment until we can have multiple sub interpreters. My idea is to create a centralized execution pool dedicated to handling Python-related tasks. This would eliminate the need for using arc and mutex to share PyObjects, avoiding the issues we've faced with sending certain objects. We could develop a procedural macro to wrap the Python-invoking code. This macro would package the code, forward it to the centralized pool using a Box, process it, and return the result. Centralizing the pool means we can manage the GIL more efficiently, reducing errors from multiple threads trying to access it simultaneously. While there's a potential bottleneck with a single interpreter, it offers the advantage of invoking Python from different places without GIL acquisition challenges. The primary shift here is that we send the code for execution instead of transferring PyObjects, ensuring the GIL is safely managed. This approach would essentially streamline our execution into a rapid, queue-based system. I'd be eager to hear your feedback on this idea and if it can potentialy work! |
The "pool" you refer to above is not a pool of GIL acquisitions, but rather a pool of objects. (There can only ever be one GIL acquisition at a time per interpreter. As per the above in this thread, PyO3 is some way off supporting sub-interpreters.) If I read your idea correctly it seems like you're proposing having one thread which is running Python workloads and you send jobs to it from other threads. That seems like a completely reasonable system architecture. |
Yeah, I'm kind lost in the PyO3 files, trying to understand how it works to see if I can help with this, but the idea is exactly that, also while I'm understanding how PyO3 works I'm making a functional model of the idea that I propose, when I have any progress I tell you guys :) I think this will work to facilitate working with multithreading while we can't have multiple py interpreters, off course this will not be the fastest thing in the world because of this fact that we only can use one interpreter and not spawn sub interpreters to distribute the work load, but will work, specially for the cases when we only need py for small things inside a parallelized data processing mechanism, I think in these cases it will help a lot |
Hey everyone! 🚀 I've crafted a hands-on demonstration of a system architecture that seamlessly integrates Python functionalities within parallelized Rust code. This approach effectively sidesteps the GIL constraints and the challenges of passing Python objects between threads. 🔗 Dive into the details and check out the comprehensive documentation here: RustPyNet on GitHub. While it's not a full-fledged multi-compiler system, it does simplify the execution of Python functions in a multi-threaded environment. For me, it's been a game-changer for projects that leverage parallelized Rust processes and use PyO3 just for callbacks. I genuinely believe this isn't just beneficial for my projects, but for many others in our community, who are working on similar projects, could greatly benefit from this integration. I'm reaching out to see if there's potential to integrate this into the PyO3 project. I'm genuinely curious about your thoughts, especially from our development team members. If there's interest, I'm more than willing to assist in its implementation. Let's discuss and explore its wider potential! 🤔👨💻👩💻 |
To get this issue back on topic, I'd be willing to contribute a decent amount in order to allow PyO3 to support sub-interpreters. We've noticed that some of our users can't use Cepn's Dashboard, which led me down quite a rabbit hole. To keep things short, I eventually stumbled across bazaah/aur-ceph#20, which lists all of the facts. In short, anything that transitively depends on PyO3 will break once sub-interpreters enter the stage, unfortunately. So... how may I help? What would be the best way to start tackling this? |
I’ve tried playing with this a bit. My first idea was to make the |
@Aequitosh thanks for the offer, it would be great to begin making progress on this. The above comment #576 (comment) is still a good summary of the state of play. Are you interested in design work? Implementation? Reviews? How much effort are you prepared to put in? This is going to be a big chunk of work. I think that a realistic mid-term solution is that:
In the long term we may be able to remove the need for extension module authors to audit their own code, once we've built up confidence of operation under subinterpreters.
I disagree slightly with the sentiment of "will break". Many extension modules implemented in C and C++ most likely also do not work correctly with subinterpreters. I read a comment from CPython devs somewhere which suggested they are aware that even if CPython 3.12 or 3.13 ships with complete subinterpreter support the ecosystem is going to need many years to transition. Regardless I support that we should do what we can to not block users who are pushing to run subinterpreters in their systems. All help with implementation is gladly welcome. I would also be open to considering an environment variable |
@GoldsteinE that's an interesting idea. Care to explain a little more about the original thesis behind making the lifetime invariant? (We might also want to split this topic into several sub-issues / discussions with back references to here...) |
@davidhewitt The idea is taken from the GhostCell paper. Basically, the signature of interpreter1.with_gil(|py1| {
interpreter2.with_gil(|py2| {
let obj1 = py1.get_some_python_ref(); // has lifetime 'py1
let obj2 = py2.get_some_python_ref(); // has lifetime 'py2
obj1.some_method(obj2); // error: lifetimes do not match
})
}) wouldn’t compile, preventing us from mixing objects from different interpreters ( My dyngo crate is a practical example of this technique. |
Interesting. I can see how that would guarantee provenance statically, but I think it might cause issues with APIs like Having the Python lifetime be invariant may be a good idea to consider as part of #3382. |
Yes, |
Heh, quite possibly! If we have to add a new smart pointer type for subinterpreters I wonder whether we could gate it behind a feature, or whether all PyO3 modules would in practice have to use it to be subinterpreter safe. I just saw this discussion for Cython - looks like they also want it opt-in only: cython/cython#2343 |
My understanding is that all smart pointers would be subinterpreter safe, i.e. contain an interpreter ID. The only difference would be whether it is checked at runtime for each access ( |
Ugh I see, I was hoping we could limit the checking just to |
My concern with checking this at runtime is that it sounds error prone. I'm rather spoiled by Rust's threadsafety being checked at compile time. |
All of the above, though I'm not sure how fit I'd be for design work, as I'm not too familiar with PyO3's internals yet. I can most definitely tackle implementation work and provide second opinions in reviews. Regarding effort (I'll rephrase this as time here): I can probably spare anywhere between 6-18 hours a week. This will vary from time to time unfortunately, as I'll have a somewhat tight schedule soon again, but I nevertheless want to make time for this (as I'm both an absolute Python and Rust nerd 😉).
Very fair point! I feel like I misphrased my point a little bit here; I think it's rather just very unexpected that an Also, to speculate here a little bit: My gut tells me that it would be beneficial for both the Python ecosystem overall and the PyO3 project itself if subinterpreters are supported by PyO3 sooner than later. If, theoretically, PyO3 would provide
I also agree with @adamreichold here; this is more something that should be controlled by extension authors. Regarding the mid- and long-term solutions you mentioned: I can't really provide my perspective on this (yet), but I'll stick with this for now. I'll probably create a fork sooner or later and mess around with PyO3's internals myself. The I'll see what I can cook up and will probably open a tracking issue somewhat soon, if that's alright. |
Upon conducting a detailed investigation into PyO3, I've noticed that the ffi module, which serves as the primary bridge for communication with Python, seems to lack implementations for Py_NewInterpreter and Py_EndInterpreter. To my knowledge, these functions have been available in the CPython API since Python 3.10. I suggest that addressing this gap should be our first step before exploring the potential of subinterpreters. Furthermore, I recommend that we start by implementing the Send and Sync traits, as previously discussed by the crate development and maintenance team. This approach could potentially lead to the establishment of a global GIL pool, allowing multiple threads to effortlessly access a Python instance. This notion is consistent with the example I shared in our prior conversations here on this topic. Successfully implementing this idea could simplify the integration of subinterpreters in the future, especially if the primary controller for subinterpreters becomes readily available. I recognize that the task at hand is complex. Several modules within the crate might need alterations, and the intricacies of this endeavor are considerable. Nonetheless, I'm enthusiastic about contributing. I'm currently delving deeper into the subinterpreters API to gain a better understanding, and I'm optimistic that I can assist in some capacity. I welcome suggestions on areas to focus on to further this initiative, and I'm hopeful that we can successfully integrate this feature into PyO3. Achieving this would represent a significant milestone, as it would provide a mechanism to utilize Python across multiple Rust threads in a fully memory-safe manner. |
I concur that subdividing this topic here might be beneficial for a more organized and in-depth discussion. There seems to be a plethora of ideas branching out from this, each with its own set of complexities and potential. I believe that by breaking down the topic, we can foster a more structured dialogue and ensure that all aspects are thoroughly explored. I'm excited about the potential this holds and look forward to the ensuing discussions! |
Thanks @Aequitosh, will await further thoughts 👍
Yes, it's extremely awkward for users to handle if they do want to use subinterpreters. It's unfortunately just necessary for safety.
@letalboy would you be willing to open a PR to add these please? After that, a suggested first step is we need to understand what the replacement for |
@letalboy sorry to not reply sooner regarding RustPyNet. I think that's a great example of how to use multiple Rust threads with a single Python thread to put workloads in the right place. I'm sure it would make a useful crate for folks facing similar problems if you wanted to publish it. I'm not personally convinced it's necessary to add such a construct to the main PyO3 crate quite yet. There is also the future possibility of nogil Python which would replace the need for the worker thread model. |
@davidhewitt You mean do some like this: pub fn new_interpreter() -> Result<*mut PyThreadState, SomeErrorType> {
let state = unsafe { Py_NewInterpreter() };
if state.is_null() {
// Handle error, perhaps fetch the Python exception, etc.
return Err(SomeErrorType);
}
Ok(state)
} Around the ffi pylifecycle.rs then implement a safety lifetime mechanism around it? I think I can do that, I only need to know where is the best place to add it and if I'm looking in the correct place |
No problem about the delay in response, I know that are a lot of things going on and that is a hard task to maintain large projects with this amount of mechanisms involved, I have some private ones that I maintain to some companies that are quite complex too, and I know that are a lot of things to handle rs. The RustPyNet crate that I uploaded in my profile are just and concept of something that I build for my self, the mention that I made for it are just an idea of a possible implementation of the mechanism to get around of having to send python objects and instead execute by sending the entire function code to a place that will call python interpreter in centralized way getting around the problem of have to call multiple ones, but since we are working to implement multiple sub interpreters now doesn't need necessarily to be that way, however i think have some ideas that we can base on this concept to facilitate the sub interpreter management like you guys are saying above that will be difficult to average users. Thanks for the suggestion, I will see if I can improve and then publish it, but i hope that in future we don't need it and have a fully functional sub-intepreters mechanism that will be the ideal scenario. Also, if you want to base in something of the crate for some sort of centralized controller ref to the sub interpreters head fell free to use it! :) |
I meant just the FFI definitions in |
Yes, this makes sense! So in what branch I can make this change in ffi? Also, I've been considering the best approach for integrating the subinterpreter support since we are start to implement the features needed for it. Given the scale of the changes we're anticipating, would it be a good idea to create a dedicated branch derived from main? I'm thinking of naming it subinterpreter-support or something like this. This would allow us to merge PRs related to this feature in an isolated environment, streamlining the testing and review processes. I'd appreciate any feedback or suggestions on this approach. |
If you'd like to, we can do this over at my fork. I haven't really gotten properly started on it, but over there we could manage things on our own instead of opening branches here in the upstream repository. Just let me know and I'll add you. |
Just create a fork and open a PR. For the wider changes, I think also best to experiment in forks and open PRs with reviewable pieces when we are happy with various ideas. |
I opened a tracking issue regarding this: Though, to keep things tidy, I opened a discussion over at my fork for everybody that wishes to get involved and contribute: I'll handle pretty much most things over at my fork, e.g. post my thoughts, initial ideas, plans, etc. over there. I will open PRs when necessary - since this is probably going to be quite the endeavour, I expect that it will be split up in lots of smaller PRs in order to make reviewing (and contributing) easier. |
Yeah, seems to be a good idea, if you want you can add me! |
Some cryptography is implemented in Rust and used via PyO3 which does not support multiple subinterpreters. See PyO3/pyo3#576 Change-Id: Iff43e99134f41b65b220765a161fdd1b94495272
* Update openstack-helm from branch 'master' to e81872d94820739398703ddf37bbe537a42a8efd - Use global wsgi subinterpreter for horizon mod_wsgi Some cryptography is implemented in Rust and used via PyO3 which does not support multiple subinterpreters. See PyO3/pyo3#576 Change-Id: Iff43e99134f41b65b220765a161fdd1b94495272
Does pyo3 allow the use case of starting multiple separate interpreters? This would be similar to Python's multiprocessing.
The text was updated successfully, but these errors were encountered: