fix: prevent unbounded Claude subprocess spawning in worker daemon#1008
fix: prevent unbounded Claude subprocess spawning in worker daemon#1008jayvenn21 wants to merge 2 commits intothedotmack:mainfrom
Conversation
Greptile OverviewGreptile SummaryThis PR adds defensive lifecycle controls around Claude CLI subprocesses in the worker daemon:
Overall this fits the existing approach of process management/cleanup in Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
autonumber
participant WS as WorkerService
participant PR as ProcessRegistry
participant OS as OS/ps
participant CP as Claude subprocess
WS->>WS: initializeBackground()
WS->>WS: initializationCompleteFlag=true
WS->>WS: resolveInitialization()
WS->>PR: reapOrphanedProcesses(activeSessionIds=∅)
PR->>PR: iterate processRegistry
PR-->>CP: SIGKILL tracked children not in activeSessionIds
PR->>OS: ps | grep claude... (killSystemOrphans)
OS-->>PR: orphan list (ppid=1)
PR-->>CP: SIGKILL system orphans
WS->>PR: startOrphanReaper(getActiveSessionIds)
loop every 5 minutes
PR->>PR: reapOrphanedProcesses(activeSessionIds)
PR->>OS: killSystemOrphans()
end
Note over PR,CP: createPidCapturingSpawn enforces MAX_CONCURRENT_CLAUDE_SUBPROCESSES
WS->>PR: createPidCapturingSpawn(sessionDbId)
PR-->>CP: spawn(command,args)
PR->>PR: registerProcess(pid, sessionDbId)
CP-->>PR: 'exit' event
PR->>PR: unregisterProcess(pid)
|
|
It looks like the failing The error indicates the GitHub Action cannot fetch an ID token permissions: Since this is a contributor PR from a fork, the action may need to be Happy to rebase or rerun once that’s resolved. |
|
See here also - #1010 |
Sounds good. I'll take a look at it. |
|
Closing in favor of #1085 (rodboev) which provides a more comprehensive solution to the same problem. Thank you for the contribution! |
Summary
This PR introduces defensive process lifecycle controls to prevent the
worker-service.cjs --daemonfrom spawning unbounded Claude CLI subprocessesthat are never cleaned up, which can lead to catastrophic memory, swap, and disk
exhaustion over time.
This change does not alter core functionality or user-facing behavior under
normal workloads. It adds guardrails and cleanup to ensure long-running daemon
sessions remain stable.
Problem
On macOS (and potentially other platforms), the daemon currently:
Over multi-day usage, this results in:
claudeprocessesMultiple related issues report the same failure mode.
Changes
This PR introduces minimal but critical safety mechanisms:
These changes are intentionally conservative and avoid architectural rewrites.
Why this approach
This is a production-safety fix:
The goal is to fail safely rather than catastrophically.
Testing
Related issues