-
Notifications
You must be signed in to change notification settings - Fork 1
fix: Segfault/GIL state violation upon Python shutdown #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8725525 to
9a118f8
Compare
31a9926 to
608d1f9
Compare
2 tasks
Member
Author
|
The test in GitHub Action passes, but it fails with SIGABRT or SIGSEGV in local environments. |
…r librarise sharing the same runtime
Register _cleanup_runtime() with atexit in all subprocess test scripts to ensure the tokio runtime is properly shut down before Python interpreter exits. This prevents race conditions between background tasks and GIL cleanup during interpreter shutdown. The tests affected: - test_shutdown_stress.py: All subprocess stress tests - test_cross_library_shutdown.py: Cross-library and baseline tests This change makes the tests explicitly wait for runtime shutdown, matching the behavior of the production atexit handler.
- Added pyo3-async-runtimes as submodule with shutdown API patches - Patched tokio.rs to add task tracking and shutdown coordination - Updated Cargo.toml to use patched local submodule - Rewrote src/runtime.rs to use new shutdown APIs - Added Python wrapper to export _cleanup_runtime - Configured maturin for mixed Python/Rust builds - All tests pass (18 passed, 2 skipped) This provides explicit control over tokio runtime shutdown, preventing GIL state violations during Python interpreter exit. Related: BA-1976, pyo3-async-runtimes#40
The pyo3-async-runtimes submodule was not being checked out during CI runs, causing build failures. Added submodules: recursive to the checkout action.
- Forked pyo3-async-runtimes to lablup/pyo3-async-runtimes - Created branch 'add-explicit-shutdown-api' with shutdown patches - Updated submodule to point to lablup fork - Submodule now at commit b8eb152 with explicit shutdown APIs This allows us to maintain the patches in our own fork while staying synchronized with upstream pyo3-async-runtimes.
Changed test scripts to call _cleanup_runtime() after asyncio.run() instead of using atexit handlers. This tests whether explicit cleanup after event loop shutdown but before process exit resolves the ARM64 Python 3.11-3.12 race condition (currently 1/20 iterations with SIGSEGV). Changes: - test_shutdown_stress.py: Updated _make_test_script() and all test scripts - test_cross_library_shutdown.py: Updated script generation functions - Removed unused asyncio import from test_shutdown_stress.py Pattern: asyncio.run(main()) followed by _cleanup_runtime()
…side async functions
This corrects the timing of cleanup to happen BEFORE the event loop shuts down,
not after. The cleanup_runtime() function should be called at the end of the
main async function, while the event loop is still running.
Changes:
- Renamed _cleanup_runtime() to cleanup_runtime() as it's a public API
- Updated all test scripts to call cleanup_runtime() inside async functions
- Updated docstrings to reflect correct usage pattern
- Updated type stub with new function signature and documentation
Correct pattern:
```python
async def main():
# Your etcd operations
...
# Cleanup before returning
cleanup_runtime()
asyncio.run(main()) # Event loop shutdown happens after cleanup
```
Updated pyo3-async-runtimes submodule to use tokio's built-in Runtime::shutdown_timeout() for proper shutdown, removing manual task counting. Changes in submodule (vendor/pyo3-async-runtimes): - Removed manual ACTIVE_TASKS tracking - Store runtime in Arc<Runtime> to allow taking ownership - Use Arc::try_unwrap() + shutdown_timeout() for graceful shutdown - Removed obsolete functions: task_started(), task_completed(), get_active_task_count() Changes in parent repo (src/runtime.rs): - Removed task tracking wrapper from spawn() - Simplified Drop implementation - Updated cleanup_runtime() to reflect tokio shutdown usage - Updated documentation This uses tokio's native task management instead of manual tracking.
Updates submodule to implement valkey-glide-style shutdown pattern with: - TokioRuntimeWrapper using dedicated thread and Handle-based access - AsyncStdRuntimeWrapper for API consistency - New spawn/spawn_blocking/request_shutdown APIs - Graceful shutdown via Drop-based cleanup This enables clean runtime shutdown and should reduce segfaults during Python finalization.
Updates pyo3-async-runtimes submodule to fix race condition where request_shutdown() returned before the runtime thread completed. ## Key Changes in Submodule - Runtime thread now calls shutdown_timeout() explicitly - request_shutdown() blocks until thread.join() completes - Uses channel to pass timeout value to runtime thread - Interior mutability via Mutex for thread handle ## Result - No more "shutdown completed" message before actual shutdown - No more GIL state violations from async shutdown - Tests pass reliably (18 passed, 2 skipped, 0 failed) - Proper synchronous blocking shutdown behavior
Refactored code for upstream PR review: - Add section comments for better code organization - Remove debug eprintln! statements - Add get_handle() as the recommended public API - Improve documentation for all public functions - Mark get_runtime() as deprecated with migration guidance PR: PyO3/pyo3-async-runtimes#71
Since pyo3-async-runtimes now has built-in shutdown support, we no longer need the EtcdRt wrapper struct. This commit: - Removes EtcdRt struct, keeping only cleanup_runtime() function - Calls pyo3_async_runtimes::tokio::future_into_py directly - Removes unnecessary cross-library shutdown test (valkey-glide uses different architecture - process isolation, not embedded runtime) - Simplifies stress tests to essential shutdown scenarios
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes segfaults/SIGABRT during Python interpreter shutdown caused by race conditions with orphaned tokio tasks.
Results:
The Problem
From pyo3-async-runtimes#40:
OnceCell<Runtime>with no cleanup mechanismPyGILState_Releaseon finalizing interpreter → SIGABRTSolution
We forked pyo3-async-runtimes to implement a dedicated runtime thread pattern (inspired by valkey-glide) that enables explicit shutdown coordination.
How It Works
The tokio runtime now lives in a dedicated thread. When
cleanup_runtime()is called:NotifyRuntime::shutdown_timeout(5s)cleanup_runtime()blocks until the runtime fully terminatesUsage
Changes
pyo3-async-runtimes (forked)
RuntimeWrapperwith dedicated runtime threadrequest_shutdown(timeout_ms)for graceful shutdownget_handle()as recommended API (deprecatesget_runtime())etcd-client-py
src/runtime.rs: Simplified to justcleanup_runtime()functionsrc/*.rs: Usepyo3_async_runtimes::tokio::future_into_pydirectlytests/: Simplified stress tests, removed cross-library testsTest Results
Related