Identifying thread-safety issues in existing code #20

bmerry · 2021-10-21T15:37:33Z

bmerry
Oct 21, 2021

The document says "In my experience, it was not too difficult to ensure that the dozens of packages used by a given project work without the GIL; I’d expect this to be easier with community involvement." How are you verifying that? In my experience, thread safety issues are really hard to detect, often only showing up under heavy load and on some machines and not others, and race detection tools like those in valgrind can be fairly hit-or-miss (and normally requires PYTHONMALLOC=malloc, which I'm guessing might be incompatible with the mimalloc change?).

While I don't have a clear picture of whether this problem only affects extensions or potentially all Python code, it seems significantly more difficult to verify that a package is nogil-safe than it was to convert from Python 2 to Python 3 (where 2to3/futurize/modernize would do most of the work, and where failures were generally predictable and obvious given good code coverage), and after 10 years there was still a lot of Python 2 code around. I don't want to knock what you've done (making single-threaded code faster is seriously impressive), but I don't think you should under-estimate the cost of breaking existing code.

Would it make sense for extensions to have to assert (through some new API) that they're no-gil ready, with the default being to assume that they aren't? There are a number of options for what to do when importing some extension when running in nogil mode:

Fall back to using the GIL (assuming it's possible at runtime).
Lock/unlock the GIL around every call to a function from that module. That probably doesn't provide full compatibility, because another Python thread that doesn't have the GIL might race to modify a Python object that the function believes it has exclusive access to; but it would provide protection against two threads both using the same module and racing for its internal data structures.
Fail the import, so that the user knows they can't safely go faster (maybe with an override for the adventurous).

colesbury · 2021-10-22T20:45:02Z

colesbury
Oct 22, 2021
Maintainer

That's a good question and the answer will change as the project progresses. For now, the focus is on getting things to work, and less about trying to root out all the long tail bugs. Reading code to look for shared state is surprisingly effective. ThreadSanitizer is sometimes useful. So is the debug allocator. Running lots of copies to find deadlocks and crashes helps. rr is really effective at going from deadlock/crash to root cause.

As the project progresses, rooting out the hard-to-detect issues will become more important. I expect that will involve more automated testing (including under TSan). For example, we could take a project's test suite and run individual tests cases concurrently from multiple threads. That won't work for all projects, but we could filter it to test cases that succeed when run that way with the GIL enabled.

Would it make sense for extensions to have to assert (through some new API) that they're no-gil ready, with the default being to assume that they aren't?

I think it makes sense for Python to default to GIL-on (at least for a few releases). For example, you have to run with PYTHONGIL=0 or -X nogil to run Python without the GIL.

I'm ambivalent about tagging extensions. It might make sense. It is possible to fall-back to enabling the GIL when importing a non-tagged extension. I'm worried that the tagging system may introduce unnecessary complexity. Alternatively, extension authors can programmatically check if the GIL is enabled and issue a warning/error/force-enable the GIL.

I don't think it makes sense to lock/unlock around every function call. As you say, that's not sufficient to provide compatibility.

Since you've developed a number of Python extensions, a few questions: Would running Python without the GIL be useful with any of your packages? Would you be interested in reviewing patches to make one (or more) of the packages thread-safe without the GIL?

1 reply

bmerry Oct 23, 2021
Author

I think it makes sense for Python to default to GIL-on (at least for a few releases).

Oh indeed. It's what happens after "a few releases" that worries me - history suggests that a lot of packages will just never be updated.

Since you've developed a number of Python extensions, a few questions: Would running Python without the GIL be useful with any of your packages? Would you be interested in reviewing patches to make one (or more) of the packages thread-safe without the GIL?

The package I've developed where thread safety issues get the most complex is spead2. No-GIL could potentially help, but not so much in the ways one might expect (allowing computations to run in parallel), and probably only with some design changes. It has soft realtime requirements (to avoid dropping incoming UDP packets) and as a result I've had to be quite careful not to take the GIL to call into Python from the critical path, as it could block for a relatively long time waiting for another thread to give up the GIL. There are also other places where I have to defer decrementing a reference until it is safe to take the GIL without either causing performance issues or just deadlocking. Thread safety for spead2 in a GIL-less world is a bit trickier than just finding shared state (I don't think there is any), because the objects exported by the package are designed to be safe for use from multiple threads, and I have probably used the GIL to protect some per-object state, but also have C++ mutexes, and I don't recall which state is implicitly protected by the GIL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifying thread-safety issues in existing code #20

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Identifying thread-safety issues in existing code #20

bmerry Oct 21, 2021

Replies: 1 comment · 1 reply

colesbury Oct 22, 2021 Maintainer

bmerry Oct 23, 2021 Author

bmerry
Oct 21, 2021

Replies: 1 comment 1 reply

colesbury
Oct 22, 2021
Maintainer

bmerry Oct 23, 2021
Author