Skip to content

Fix orphaned simulator processes on Ctrl+C or task failure#1427

Open
Jjateen wants to merge 7 commits intoriscv:act4from
Jjateen:fix/orphan-procs-1238
Open

Fix orphaned simulator processes on Ctrl+C or task failure#1427
Jjateen wants to merge 7 commits intoriscv:act4from
Jjateen:fix/orphan-procs-1238

Conversation

@Jjateen
Copy link
Copy Markdown
Contributor

@Jjateen Jjateen commented May 4, 2026

When a task failed early (keep_going=False) or the user pressed Ctrl+C, any sail_riscv_sim processes already launched by ThreadPoolExecutor worker threads were left alive. Workers blocked in proc.communicate() kept the executor alive and the simulator ran indefinitely.

Three fixes in build.py:

  • Switch SubprocessAction execution from subprocess.run() to Popen with start_new_session=True so each simulator gets its own process group, allowing os.killpg() to reach any children it spawns.
  • Add _ProcTracker, a thread-safe registry of running process group IDs. Workers register their pgid on launch and deregister on completion. A shutting_down flag ensures late-registrants (processes that start after kill_all() is called) are killed immediately on add().
  • Call tracker.kill_all() on early task failure (before executor shutdown) and in a try/finally so cleanup runs on Ctrl+C or any other exception. A SIGINT handler also calls kill_all() to terminate simulators before ThreadPoolExecutor waits for worker threads to join.

Fixes #1238.

When a task failed early (keep_going=False) or the user pressed Ctrl+C,
any sail_riscv_sim processes already launched by ThreadPoolExecutor worker
threads were left alive. Workers blocked in proc.communicate() kept the
executor alive and the simulator ran indefinitely.

Three fixes in build.py:
- Switch SubprocessAction execution from subprocess.run() to Popen with
  start_new_session=True so each simulator gets its own process group,
  allowing os.killpg() to reach any children it spawns.
- Add _ProcTracker, a thread-safe registry of running process group IDs.
  Workers register their pgid on launch and deregister on completion.
  A shutting_down flag ensures late-registrants (processes that start
  after kill_all() is called) are killed immediately on add().
- Call tracker.kill_all() on early task failure (before executor shutdown)
  and in a try/finally so cleanup runs on Ctrl+C or any other exception.
  A SIGINT handler also calls kill_all() to terminate simulators before
  ThreadPoolExecutor waits for worker threads to join.

Fixes riscv#1238.
@Jjateen Jjateen force-pushed the fix/orphan-procs-1238 branch from 9e07973 to a04e0dc Compare May 4, 2026 08:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ACT’s Python DAG build executor to prevent orphaned sail_riscv_sim (and child) processes when the run is interrupted (Ctrl+C) or stops early due to a task failure with keep_going=False.

Changes:

  • Run SubprocessAction commands via subprocess.Popen(..., start_new_session=True) and track process-group IDs for group termination.
  • Add a thread-safe _ProcTracker to register/deregister running subprocess PGIDs and kill them on shutdown.
  • Ensure cleanup runs on early failure and via a SIGINT handler / try/finally kill path.

Comment thread framework/src/act/build.py Outdated
@Jjateen Jjateen force-pushed the fix/orphan-procs-1238 branch from d032af3 to c627e38 Compare May 4, 2026 09:37
@Jjateen Jjateen force-pushed the fix/orphan-procs-1238 branch from 0d9151a to 151eb9c Compare May 4, 2026 09:42
@Jjateen Jjateen force-pushed the fix/orphan-procs-1238 branch from 90f6b7b to 9bd0f3e Compare May 5, 2026 15:12
@Jjateen
Copy link
Copy Markdown
Contributor Author

Jjateen commented May 5, 2026

@jordancarlin can you review/approve this further?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orphaned sail_riscv_sim processes on Ctrl+C or task failure

2 participants