fix: detect stale PID files via health endpoint cross-check (#1231)#1357
fix: detect stale PID files via health endpoint cross-check (#1231)#1357ousamabenyounes wants to merge 2 commits intothedotmack:mainfrom
Conversation
…otmack#1231) After cleanStalePidFile() determines a PID is alive (kill -0), cross-check the health endpoint's reported PID against the PID file. Removes stale PID files when the health endpoint reports a different PID (PID reuse) or when the process is alive but not responding to health checks (zombie worker). Adds getHealthPid() utility and 4 tests covering the new function. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code ReviewOverviewThis PR addresses issue #1231 by adding a health endpoint cross-check to detect stale PID files. The fix prevents PID reuse issues and zombie PID files after OOM/sleep/wake scenarios. Strengths
Issues & ConcernsNone identified. The implementation is sound. Security ConcernsNone identified. Potential BugsNone identified. The error handling is comprehensive. Test CoverageExcellent test coverage for the new getHealthPid function:
Consider adding integration tests for the worker-service.ts changes to verify the end-to-end behavior of stale PID file detection. Recommendations
SummaryThis is a well-implemented fix that addresses a real-world issue with PID file management. The code is clean, well-tested, and includes good error handling and logging. Overall Assessment: Approve |
Simulates three scenarios: health reports different PID (PID reuse), health unreachable (zombie worker), and PIDs match (healthy worker). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the review! Added the recommended integration tests in commit 2df3e77:
All 1113 tests pass, 0 failures. |
|
Superseded by the embedded Process Supervisor (PR #1370, v10.5.6). Stale PID detection is now handled by |
Summary
getHealthPid()to fetch the worker's actual PID from/api/healthcleanStalePidFile()(which only checkskill -0), cross-checks the health endpoint PID against the PID fileFixes #1231
Test plan
getHealthPid()(PID returned, connection refused, non-ok response, missing pid field)🤖 Coded by Claude, vibe-coded by Ousama Ben Younes
🤖 Generated with Claude Code