Skip to content

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Dec 21, 2025

Continuation of work done initially under PRs #5934 and #5933

The write-up below is a Cursor-generated.

In an earlier draft of the below, Cursor asked "Could this be a bug in cpython?" Cursor was offering to generate a minimal C-only reproducer. We can come back to that if we want to.

__

Investigation: Free-Threaded Python 3.14 Hang in "Move Subinterpreter" Test

To: @b-pass
From: Investigation with Cursor AI assistance
Date: December 20, 2025
Re: PR #5933 - Root cause of Py_EndInterpreter() hang on free-threaded Python 3.14.2


Executive Summary

We've isolated the exact cause of the "Move Subinterpreter" test hang on free-threaded Python 3.14.2. The issue is not in gil_safe_call_once_and_store, the internals capsule destructors, or any cleanup code.

The root cause is a single line in py::subinterpreter::create():

PyThreadState_Swap(prev_tstate);  // subinterpreter.h:124

When this is called after PyThreadState_DeleteCurrent() during subinterpreter creation, it leaves the system in a state where later calling Py_EndInterpreter() from a different thread causes a deadlock.


Background

The Failing Test

The "Move Subinterpreter" test (test_subinterpreter.cpp:94-119) does:

  1. Creates a subinterpreter on the main thread
  2. Activates it, imports modules, deactivates
  3. Spawns a worker thread that:
    • Activates the same subinterpreter
    • Imports a module
    • Destroys the subinterpreter (sub.reset())
  4. Joins the worker thread

This test passes on:

  • Default (non-free-threaded) Python 3.14.2 ✅
  • Free-threaded Python 3.14.0t ✅ (sporadically fails on macOS)

This test hangs on:

  • Free-threaded Python 3.14.1t ❌
  • Free-threaded Python 3.14.2t ❌

Prior Findings

A pure C reproducer (move_subinterpreter_redux.c) that mimics the same pattern using only CPython C API passes on both 3.14.0t and 3.14.1t. This indicated the issue was in pybind11's internals, not CPython itself.


Investigation Methodology

We systematically created minimal test cases, each removing one aspect of pybind11's subinterpreter handling, until we found what causes the hang.

Test Matrix

Test Case Key Difference Result
move_subinterpreter_redux.c Pure C, no pybind11 ✅ Pass
debug_pure_c_with_pb11_main.cpp pybind11 main interpreter, pure C subinterpreter ✅ Pass
debug_no_swap_back.cpp Like above, but don't swap back after creation ✅ Pass
debug_with_swap_back.cpp Add PyThreadState_Swap(prev_tstate) after creation ❌ Hang
debug_no_get_internals.cpp Skip get_internals() call ❌ Hang
debug_no_num_interp.cpp Skip get_num_interpreters_seen()++ ❌ Hang
Any test using py::subinterpreter::create() Uses full pybind11 code ❌ Hang

The critical finding: The only difference between passing and failing tests is PyThreadState_Swap(prev_tstate) after PyThreadState_DeleteCurrent().


Root Cause Analysis

The Problematic Code Path

In subinterpreter.h, the create() function does:

static subinterpreter create(PyInterpreterConfig const &cfg) {
    error_scope err_scope;
    subinterpreter result;
    {
        // Activate main interpreter to hold its GIL
        subinterpreter_scoped_activate main_guard(main());

        auto *prev_tstate = PyThreadState_Get();  // Save main's tstate

        // Create subinterpreter - now creation_tstate_ is current
        Py_NewInterpreterFromConfig(&result.creation_tstate_, &cfg);

        result.istate_ = result.creation_tstate_->interp;
        detail::get_num_interpreters_seen() += 1;
        detail::get_internals();  // Initialize internals for subinterpreter

        // Clean up creation tstate (3.14+ style)
        PyThreadState_Clear(result.creation_tstate_);
        PyThreadState_DeleteCurrent();  // Thread state is now NULL

        // Switch back to main interpreter
        PyThreadState_Swap(prev_tstate);  // <-- THIS CAUSES THE HANG
    }
    return result;
}

Why This Causes a Hang

On free-threaded Python 3.14.2, the sequence:

  1. PyThreadState_DeleteCurrent() - deletes the subinterpreter's creation tstate
  2. PyThreadState_Swap(prev_tstate) - swaps to main interpreter's tstate

...appears to leave some internal state inconsistent. Specifically, when later:

  1. A different thread creates a new tstate for the subinterpreter
  2. That thread calls Py_EndInterpreter()

...the Py_EndInterpreter() call deadlocks.

The Pure C Pattern That Works

The working C reproducer does this instead:

static void create_subinterpreter(void) {
    PyThreadState *creation_tstate = NULL;
    Py_NewInterpreterFromConfig(&creation_tstate, &cfg);
    sub_interp = creation_tstate->interp;

    // Clean up creation tstate
    PyThreadState_Clear(creation_tstate);
    PyThreadState_DeleteCurrent();
    // NOTE: Does NOT call PyThreadState_Swap() here
    // Thread state is left as NULL
}

The key difference: no PyThreadState_Swap() after PyThreadState_DeleteCurrent().


Minimal Reproducer

Here's the minimal code that demonstrates the issue:

// HANGS on free-threaded Python 3.14.2
#include <pybind11/embed.h>
#include <thread>

namespace py = pybind11;

static PyInterpreterState *sub_interp = nullptr;

int main() {
    py::scoped_interpreter guard{};
    
    // Create subinterpreter
    {
        PyThreadState *prev_tstate = PyThreadState_Get();  // Save main tstate
        
        PyInterpreterConfig cfg = {0};
        cfg.allow_threads = 1;
        cfg.check_multi_interp_extensions = 1;
        cfg.gil = PyInterpreterConfig_OWN_GIL;

        PyThreadState *creation_tstate = nullptr;
        Py_NewInterpreterFromConfig(&creation_tstate, &cfg);
        sub_interp = creation_tstate->interp;
        
        PyThreadState_Clear(creation_tstate);
        PyThreadState_DeleteCurrent();
        
        PyThreadState_Swap(prev_tstate);  // <-- REMOVE THIS LINE TO FIX
    }

    // Use subinterpreter on main thread
    {
        PyThreadState* tstate = PyThreadState_New(sub_interp);
        PyThreadState* old = PyThreadState_Swap(tstate);
        PyRun_SimpleString("import datetime");
        PyThreadState_Clear(tstate);
        PyThreadState_DeleteCurrent();
        PyThreadState_Swap(old);
    }

    // Destroy from worker thread - THIS HANGS
    std::thread([&]() {
        PyThreadState* tstate = PyThreadState_New(sub_interp);
        PyThreadState_Swap(tstate);
        PyThreadState_Clear(tstate);
        PyThreadState_DeleteCurrent();
        
        PyThreadState *destroy_tstate = PyThreadState_New(sub_interp);
        PyThreadState_Swap(destroy_tstate);
        Py_EndInterpreter(destroy_tstate);  // <-- HANGS HERE
    }).join();
    
    return 0;
}

To fix: Remove the PyThreadState_Swap(prev_tstate) line after creation.


Attempted Fixes

We tried several quick fixes, none of which fully worked:

Attempt 1: Skip PyThreadState_Swap(prev_tstate) on free-threaded Python

#if defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030D0000
    (void) prev_tstate;  // Skip the swap
#else
    PyThreadState_Swap(prev_tstate);
#endif

Result: No longer hangs, but crashes with:

Fatal Python error: PyGILState_Release: auto-releasing thread-state, 
but no thread-state for this thread

The subinterpreter_scoped_activate main_guard(main()) destructor tries to call PyGILState_Release() but there's no current thread state after we skipped the swap.

Attempt 2: Create a fresh thread state for main instead of reusing prev_tstate

#if defined(Py_GIL_DISABLED)
    (void) prev_tstate;
    PyThreadState *fresh_main_tstate = PyThreadState_New(PyInterpreterState_Main());
    PyThreadState_Swap(fresh_main_tstate);
#else
    PyThreadState_Swap(prev_tstate);
#endif

Result: Segfault. The fresh tstate doesn't properly integrate with the saved state in main_guard.

Conclusion

The fix is not trivial because the subinterpreter_scoped_activate main_guard(main()) at line 85 saves state (gil_state_ and/or old_tstate_) that becomes stale after PyThreadState_DeleteCurrent(). A proper fix likely requires restructuring how create() manages the main interpreter's GIL, possibly:

  1. Not using subinterpreter_scoped_activate inside create() on free-threaded Python
  2. Using a different pattern that doesn't rely on swapping back to a saved thread state
  3. Some other approach?

Files Referenced

  • include/pybind11/subinterpreter.h - lines 79-127 (create() function)
  • tests/test_with_catch/test_subinterpreter.cpp - lines 94-119 ("Move Subinterpreter" test)
  • Debug test files:

Appendix: Debug Output

Passing Test (no swap back)

[DEBUG] Starting scoped_interpreter...
[DEBUG] scoped_interpreter created
[DEBUG] Creating subinterpreter...
[DEBUG] Subinterpreter created, id=1
[DEBUG] Main thread: activating...
[DEBUG] Main thread: imported
[DEBUG] Spawning worker thread...
[DEBUG] Worker thread: started
[DEBUG] Worker thread: activated
[DEBUG] Worker thread: about to Py_EndInterpreter...
[DEBUG] Worker thread: Py_EndInterpreter done!
[DEBUG] Thread joined, test complete!
Exit code: 0

Failing Test (with swap back)

[DEBUG] Starting scoped_interpreter...
[DEBUG] scoped_interpreter created
[DEBUG] Creating subinterpreter...
[DEBUG] Called PyThreadState_Swap(prev_tstate) after DeleteCurrent
[DEBUG] Subinterpreter created, id=1
[DEBUG] Main thread: activating...
[DEBUG] Main thread: imported
[DEBUG] Spawning worker thread...
[DEBUG] Worker thread: started
[DEBUG] Worker thread: activated
[DEBUG] Worker thread: about to Py_EndInterpreter...
Exit code: 124  (timeout - hung in Py_EndInterpreter)

This investigation was conducted using local builds of Python 3.14.2 (default and free-threaded) from commit df793163d58.


📚 Documentation preview 📚: https://pybind11--5940.org.readthedocs.build/

rwgk added a commit to XuehaiPan/pybind11 that referenced this pull request Dec 21, 2025
rwgk added a commit to rwgk/pybind11 that referenced this pull request Dec 22, 2025
This commit improves the C++ test infrastructure to ensure test output
is visible in CI logs, and disables a test that hangs on free-threaded
Python 3.14+.

Changes:

## CI/test infrastructure improvements

- .github/workflows: Added `timeout-minutes: 3` to all C++ test steps
  to prevent indefinite hangs.

- tests/**/CMakeLists.txt: Added `USES_TERMINAL` to C++ test targets
  (cpptest, test_cross_module_rtti, test_pure_cpp) to ensure output is
  shown immediately rather than buffered and possibly lost on crash/timeout.

- tests/test_with_catch/catch.cpp: Added a custom Catch2 progress reporter
  with timestamps, Python version info, and a SIGTERM handler to make test
  execution and failures clearly visible in CI logs.

## Disabled hanging test

- The "Move Subinterpreter" test is disabled on free-threaded Python 3.14+
  due to a hang in Py_EndInterpreter() when the subinterpreter is destroyed
  from a different thread than it was created on. Work on fixing the
  underlying issue will continue under PR pybind#5940.

Context: We were in the dark for months (since we started testing with
Python 3.14t) because CI logs gave no clue about the root cause of hangs.
This led to ignoring intermittent hangs (mostly on macOS). Our hand was
forced only with the Python 3.14.1 release, when hangs became predictable
on all platforms.

For the full development history of these changes, see PR pybind#5933.
rwgk added a commit to rwgk/pybind11 that referenced this pull request Dec 22, 2025
Catch2 v2 doesn't have native skip support (v3 does with SKIP()).
This macro allows tests to be skipped with a visible message while
still appearing in the test list.

Use this for the Move Subinterpreter test on free-threaded Python 3.14+
so it shows as skipped rather than being conditionally compiled out.

Example output:
  [ RUN      ] Move Subinterpreter
  [ SKIPPED ] Skipped on free-threaded Python 3.14+ (see PR pybind#5940)
  [       OK ] Move Subinterpreter
rwgk added a commit that referenced this pull request Dec 22, 2025
…p hanging Move Subinterpreter test (#5942)

* Improve C++ test infrastructure and disable hanging test

This commit improves the C++ test infrastructure to ensure test output
is visible in CI logs, and disables a test that hangs on free-threaded
Python 3.14+.

Changes:

## CI/test infrastructure improvements

- .github/workflows: Added `timeout-minutes: 3` to all C++ test steps
  to prevent indefinite hangs.

- tests/**/CMakeLists.txt: Added `USES_TERMINAL` to C++ test targets
  (cpptest, test_cross_module_rtti, test_pure_cpp) to ensure output is
  shown immediately rather than buffered and possibly lost on crash/timeout.

- tests/test_with_catch/catch.cpp: Added a custom Catch2 progress reporter
  with timestamps, Python version info, and a SIGTERM handler to make test
  execution and failures clearly visible in CI logs.

## Disabled hanging test

- The "Move Subinterpreter" test is disabled on free-threaded Python 3.14+
  due to a hang in Py_EndInterpreter() when the subinterpreter is destroyed
  from a different thread than it was created on. Work on fixing the
  underlying issue will continue under PR #5940.

Context: We were in the dark for months (since we started testing with
Python 3.14t) because CI logs gave no clue about the root cause of hangs.
This led to ignoring intermittent hangs (mostly on macOS). Our hand was
forced only with the Python 3.14.1 release, when hangs became predictable
on all platforms.

For the full development history of these changes, see PR #5933.

* Add test summary to progress reporter

Print the total number of test cases and assertions at the end of the
test run, making it easy to spot if tests are disabled or added.

Example output:
  [  PASSED  ] 20 test cases, 1589 assertions.

* Add PYBIND11_CATCH2_SKIP_IF macro to skip tests at runtime

Catch2 v2 doesn't have native skip support (v3 does with SKIP()).
This macro allows tests to be skipped with a visible message while
still appearing in the test list.

Use this for the Move Subinterpreter test on free-threaded Python 3.14+
so it shows as skipped rather than being conditionally compiled out.

Example output:
  [ RUN      ] Move Subinterpreter
  [ SKIPPED ] Skipped on free-threaded Python 3.14+ (see PR #5940)
  [       OK ] Move Subinterpreter

* Fix clang-tidy bugprone-macro-parentheses warning in PYBIND11_CATCH2_SKIP_IF
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 22, 2025

@b-pass, it'd be great if we could connect directly. Could you please email me under the email address you see with git log | grep rwgkio?

@rwgk rwgk force-pushed the move_subinterpreter_freethreaded branch from 17a8695 to 1bf6e51 Compare December 22, 2025 08:35
@rwgk rwgk changed the title Placeholder PR for work on re-enabling the Move Subinterpreter test for free-threaded Python 3.14 [WIP] Re-enable Move Subinterpreter test for free-threaded Python 3.14 Dec 22, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 22, 2025

@b-pass This is now a real PR. Could you take it from here?

@b-pass b-pass self-assigned this Dec 23, 2025
@b-pass
Copy link
Collaborator

b-pass commented Dec 23, 2025

Simpler than expected, Python is doing a "stop-the-world" during Py_EndInterpreter. So before we can join the thread, we have to detach from any/every/all thread states (including those from different interpreters or, in this case, the main interpreter). So that's why removing the Swap worked, it left the main thread with no active interpreter.

The fix is easy, a GIL release call detaches the state. It's rather unexpected though, I think it makes sub-interpreters a little more awkward to use. I added a note in the docs about it.

I had a lot of trouble getting a usable nogil environment ... seems like pyconfig.h doesn't include properly from the deadsnakes PPA install unless I include <pyconfig.h> before <Python.h>, which seems wrong.

@b-pass
Copy link
Collaborator

b-pass commented Dec 23, 2025

This was changed in python/cpython#128639

@b-pass b-pass marked this pull request as ready for review December 23, 2025 04:17
@rwgk rwgk changed the title [WIP] Re-enable Move Subinterpreter test for free-threaded Python 3.14 Re-enable Move Subinterpreter test for free-threaded Python 3.14 Dec 23, 2025
Copy link
Collaborator Author

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @b-pass for figuring this out. I had Cursor explain to me why this works. I'll attach some of the stuff it explained to me.

I cannot approve here because technically it's my own PR. I'll go ahead and merge this.

@rwgk rwgk merged commit 799f591 into pybind:master Dec 23, 2025
108 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 23, 2025
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Dec 23, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 23, 2025

Cursor-generated:


When Was Stop-the-World Added to Py_EndInterpreter?

The Actual Breaking Commit

The commit that introduced STW in Py_EndInterpreter() is:

21914979335 gh-136003: Execute pre-finalization callbacks in a loop (GH-136004)
Author: Peter Bierma <[email protected]>
Date:   Thu Sep 18 08:29:12 2025 -0400

Not commit 08bea299bfd (which is about out-of-memory handling).

What Changed

The commit added make_pre_finalization_calls() which is called from Py_EndInterpreter():

// From Python/pylifecycle.c

void
Py_EndInterpreter(PyThreadState *tstate)
{
    // ...
    interp->finalizing = 1;

    // This call stops the world and takes the pending calls lock.
    make_pre_finalization_calls(tstate, /*subinterpreters=*/0);

    ASSERT_WORLD_STOPPED(interp);
    // ...
}

And inside make_pre_finalization_calls():

static void
make_pre_finalization_calls(PyThreadState *tstate, int subinterpreters)
{
    for (;;) {
        // ... do cleanup work ...

        /* Stop the world to prevent other threads from creating threads or
         * atexit callbacks. */
        PyMutex_Lock(&interp->ceval.pending.mutex);
        _PyEval_StopTheWorldAll(interp->runtime);  // <-- HERE!
        
        // Check if more work is needed...
        if (!should_continue) {
            break;  // Exit with world stopped
        }
        _PyEval_StartTheWorldAll(interp->runtime);
        PyMutex_Unlock(&interp->ceval.pending.mutex);
    }
    // Returns with world STOPPED
    ASSERT_WORLD_STOPPED(interp);
}

Key Points

  1. Free-threading only: The _PyEval_StopTheWorldAll() call is inside #ifdef Py_GIL_DISABLED

  2. Added in Python 3.14 development cycle: The commit is dated September 2025

  3. Why it was added: To safely handle atexit callbacks, pending calls, and thread cleanup during interpreter finalization. Multiple threads could be creating new threads or callbacks, so STW ensures a clean shutdown.

  4. The loop: Finalization now runs in a loop because atexit callbacks can spawn threads, threads can add pending calls, etc. STW is used to check "are we done yet?" safely.

Why Commit 08bea299bfd Was Misleading

That commit (gh-112729: Correctly fail when the process is out of memory) was a cherry-pick to the 3.14 branch that happened around the same time. It touched Python/pylifecycle.c but didn't introduce the STW behavior. The STW change was already in the codebase from the earlier gh-136003 commit.

Timeline

  • 3.13: Free-threading introduced, but Py_EndInterpreter() didn't do STW
  • 3.14 (Sep 2025): gh-136003 added STW to interpreter finalization
  • 3.14.1+: The fix is needed for subinterpreter destruction on non-main threads

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 23, 2025

Cursor-generated:


Why Bisect Found 08bea299bfd Instead of 21914979335

Different Branches

21914979335 → on `main` branch only
08bea299bfd → on `3.14` branch only

The bisect was likely done on the 3.14 branch (the release branch you were testing against), not main.

The 3.14 Branch Has Different Code

Looking at the 3.14 branch's Py_EndInterpreter():

// 3.14 branch version (simpler)
Py_EndInterpreter(PyThreadState *tstate)
{
    // ...
    wait_for_thread_shutdown(tstate);
    _Py_FinishPendingCalls(tstate);
    _PyAtExit_Call(tstate->interp);
    
    _PyEval_StopTheWorldAll(runtime);  // STW is here directly
    // ...
}

vs. main branch:

// main branch version (with the loop)
Py_EndInterpreter(PyThreadState *tstate)
{
    // ...
    make_pre_finalization_calls(tstate, 0);  // STW is inside this function
    // ...
}

The Actual Culprit on 3.14

On the 3.14 branch, STW in Py_EndInterpreter was added by:

cec4ddf23e5 [3.14] gh-128639: Don't assume one thread in subinterpreter finalization 
             with fixed daemon thread support (GH-134606) (GH-139050)
Date: Tue Oct 7 19:30:38 2025

And 08bea299bfd was cherry-picked at the same time (Oct 7 2025 19:30:23).

Why Bisect Picked 08bea299bfd

Most likely:

  1. Both commits landed together in the same batch of cherry-picks
  2. The bisect found 08bea299bfd because it was the first in that batch (by a few seconds or commit order)
  3. The actual STW change was in cec4ddf23e5 but bisect identifies the first "bad" commit, which could be either if they landed together

Or:

  1. Your test case specifically triggered something that 08bea299bfd changed (the _PyInterpreterState_New refactor)
  2. The interaction between the new _PyInterpreterState_New calling convention AND the STW code caused the specific deadlock

Bottom Line

The bisect result is technically accurate—08bea299bfd is the first commit where your test failed on the 3.14 branch. But the root cause (STW in Py_EndInterpreter) was introduced by cec4ddf23e5 which landed in the same batch.

The fix (releasing the GIL before joining) addresses the STW issue regardless of which specific commit introduced it.

@rwgk rwgk deleted the move_subinterpreter_freethreaded branch December 23, 2025 05:37
rwgk added a commit that referenced this pull request Dec 25, 2025
* Add new argument to `gil_safe_call_once_and_store::call_once_and_store_result`

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Make `~gil_safe_call_once_and_store` a no-op

* Fix C++11 compatibility

* Improve thread-safety and add default finalizer

* Try fix thread-safety

* Try fix thread-safety

* Add a warning comment

* Simplify `PYBIND11_INTERNALS_VERSION >= 12`

* Try fix thread-safety

* Try fix thread-safety

* Revert get_pp()

* Update comments

* Move call-once storage out of internals

* Revert internal version bump

* Cleanup outdated comments

* Move atomic_bool alias into pybind11::detail namespace

The `using atomic_bool = ...` declaration was at global scope,
polluting the global namespace. Move it into pybind11::detail
to avoid potential conflicts with user code.

* Add explicit #include <unordered_map> for subinterpreter support

The subinterpreter branch uses std::unordered_map but relied on
transitive includes. Add an explicit include for robustness.

* Remove extraneous semicolon after destructor definition

Style fix: remove trailing semicolon after ~call_once_storage()
destructor body.

* Add comment explaining unused finalize parameter

Clarify why the finalize callback parameter is intentionally ignored
when subinterpreter support is disabled: the storage is process-global
and leaked to avoid destructor calls after interpreter finalization.

* Add comment explaining error_scope usage

Clarify why error_scope is used: to preserve any existing Python
error state that might be cleared or modified by dict_getitemstringref.

* Improve exception safety in get_or_create_call_once_storage_map()

Use std::unique_ptr to hold the newly allocated storage map until
the capsule is successfully created. This prevents a memory leak
if capsule creation throws an exception.

* Add timeout-minutes: 3 to cpptest workflow steps

Add a 3-minute timeout to all C++ test (cpptest) steps across all
platforms to detect hangs early. This uses GitHub Actions' built-in
timeout-minutes property which works on Linux, macOS, and Windows.

* Add progress reporter for test_with_catch Catch2 runner

Add a custom Catch2 streaming reporter that prints one line per test
case as it starts and ends, with immediate flushing to keep CI logs
current. This makes it easy to see where the embedded/interpreter
tests are spending time and to pinpoint which test case is stuck
when builds hang (e.g., free-threading issues).

The reporter:
- Prints "[ RUN      ]" when each test starts
- Prints "[       OK ]" or "[  FAILED  ]" when each test ends
- Prints the Python version once at the start via Py_GetVersion()
- Uses StreamingReporterBase for immediate output (not buffered)
- Is set as the default reporter via CATCH_CONFIG_DEFAULT_REPORTER

This approach gives visibility into all tests without changing their
behavior, turning otherwise opaque 90-minute CI timeouts into
locatable issues in the Catch output.

* clang-format auto-fix (overlooked before)

* Disable "Move Subinterpreter" test on free-threaded Python 3.14+

This test hangs in Py_EndInterpreter() when the subinterpreter is
destroyed from a different thread than it was created on.

The hang was observed:
- Intermittently on macOS with Python 3.14.0t
- Predictably on macOS, Ubuntu, and Windows with Python 3.14.1t and 3.14.2t

Root cause analysis points to an interaction between pybind11's
subinterpreter creation code and CPython's free-threaded runtime,
specifically around PyThreadState_Swap() after PyThreadState_DeleteCurrent().

See detailed analysis: #5933

* style: pre-commit fixes

* Add test for gil_safe_call_once_and_store per-interpreter isolation

This test verifies that gil_safe_call_once_and_store provides separate
storage for each interpreter when subinterpreter support is enabled.

The test caches the interpreter ID in the main interpreter, then creates
a subinterpreter and verifies it gets its own cached value (not the main
interpreter's). Without per-interpreter storage, the subinterpreter would
incorrectly see the main interpreter's cached object.

* Add STARTING/DONE timestamps to test_with_catch output

Print UTC timestamps at the beginning and end of the test run to make
it immediately clear when tests started and whether they ran to
completion. The DONE message includes the Catch session result value.

Example output:
  [ STARTING ] 2025-12-21 03:23:20.497Z
  [ PYTHON   ] 3.14.2 ...
  [ RUN      ] Threads
  [       OK ] Threads
  [ DONE     ] 2025-12-21 03:23:20.512Z (result 0)

* Disable stdout buffering in test_with_catch

Ensure test output appears immediately in CI logs by disabling stdout
buffering. Without this, output may be lost if the process is killed
by a timeout, making it difficult to diagnose which test was hanging.

* EXPERIMENT: Re-enable hanging test to verify CI log buffering fix

This is a temporary commit to verify that the unbuffered stdout fix
makes the hanging test visible in CI logs. REVERT THIS COMMIT after
confirming the output appears.

* Revert "Disable stdout buffering in test_with_catch"

This reverts commit 0f8f32a.

* Use USES_TERMINAL for cpptest to show output immediately

Ninja buffers subprocess output until completion. When a test hangs,
the output is never shown, making it impossible to diagnose which test
is hanging. USES_TERMINAL gives the command direct terminal access,
bypassing ninja's buffering.

This explains why Windows CI showed test progress but Linux/macOS did
not - Windows uses MSBuild which doesn't buffer the same way.

* Fix clang-tidy performance-avoid-endl warning

Use '\n' instead of std::endl since USES_TERMINAL now handles
output buffering at the CMake level.

* Add SIGTERM handler to show when test is killed by timeout

When a test hangs and is killed by `timeout`, Catch2 marks it as failed
but the process exits before printing [ DONE ]. This made it unclear
whether the test failed normally or was terminated.

The signal handler prints a clear message when SIGTERM is received,
making timeout-related failures obvious in CI logs.

* Fix typo: atleast -> at_least

* Fix GCC warn_unused_result error for write() in signal handler

Assign the return value to a variable to satisfy GCC's warn_unused_result
attribute, then cast to void to suppress unused variable warning.

* Add USES_TERMINAL to other C++ test targets

Apply the same ninja output buffering fix to test_cross_module_rtti
and test_pure_cpp targets. Also add explanatory comments to all
USES_TERMINAL usages.

* Revert "EXPERIMENT: Re-enable hanging test to verify CI log buffering fix"

This reverts commit a3abdee.

* Update comment to reference PR #5940 for Move Subinterpreter fix

* Add alias `interpid_t = std::int64_t`

* Add isolation and gc test for `gil_safe_call_once_and_store`

* Add thread local cache for gil_safe_call_once_and_store

* Revert "Add thread local cache for gil_safe_call_once_and_store"

This reverts commit 5d66819.

* Revert changes according to code review

* Relocate multiple-interpreters tests

* Add more tests for multiple interpreters

* Remove copy constructor

* Apply suggestions from code review

* Refactor to use per-storage capsule instead

* Update comments

* Update singleton tests

* Use interpreter id type for `get_num_interpreters_seen()`

* Suppress unused variable warning

* HACKING

* Revert "HACKING"

This reverts commit 534235e.

* Try fix concurrency

* Test even harder

* Reorg code to avoid duplicates

* Fix unique_ptr::reset -> unique_ptr::release

* Extract reusable functions

* Fix indentation

* Appease warnings for MSVC

* Appease warnings for MSVC

* Appease warnings for MSVC

* Try fix concurrency by not using `get_num_interpreters_seen() > 1`

* Try fix tests

* Make Python path handling more robust

* Update comments and assertion messages

* Revert changes according to code review

* Disable flaky tests

* Use `@pytest.mark.xfail` rather than `pytest.skip`

* Retrigger CI

* Retrigger CI

* Revert file moves

* Refactor atomic_get_or_create_in_state_dict: improve API and fix on_fetch_ bug

Three improvements to atomic_get_or_create_in_state_dict:

1. Return std::pair<Payload*, bool> instead of just Payload*
   - The bool indicates whether storage was newly created (true) or
     already existed (false), following std::map::insert convention.
   - This fixes a bug where on_fetch_ was called even for newly created
     internals, when it should only run for fetched (existing) ones.
     (Identified by @b-pass in code review)

2. Change LeakOnInterpreterShutdown from template param to runtime arg
   - Renamed to `clear_destructor` to describe what it does locally,
     rather than embedding assumptions about why it's used.
   - Reduces template instantiations (header-only library benefits).
   - The check is in the slow path (create) anyway, so negligible cost.

3. Remove unnecessary braces around the fast-path lookup
   - The braces created a nested scope but declared no local variables
     that would benefit from scoping.

* Remove unused PYBIND11_MULTIPLE_INTERPRETERS_TEST_FILES variable

This variable was defined but never used.

---------

Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants