Skip to content

Conversation

@mdmosby
Copy link
Contributor

@mdmosby mdmosby commented Oct 30, 2025

Through experimentation, it appears that SIGINT is often insufficient to kill timed-out processes. This PR improves the reliability of releasing resources back to the pool by always sending a SIGKILL/TERM signal to the process tree for the test. This signal is sent recursively to all children spawned by the test as well.

Experimentation with a large MPI code that was catching/handling signals and not releasing resources is successfully killed by this change.

@mdmosby
Copy link
Contributor Author

mdmosby commented Oct 30, 2025

This needs some more work -- failed the examples

@mdmosby mdmosby marked this pull request as draft October 30, 2025 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant