fix: prevent cold start hook errors from aggressiveStartupCleanup killing parent process#1427
fix: prevent cold start hook errors from aggressiveStartupCleanup killing parent process#1427sangfansh wants to merge 1 commit intothedotmack:mainfrom
Conversation
…cold start Fixes multiple cold start issues that cause "SessionStart:startup hook error" and missing context banner on first launch after reboot. Root cause: aggressiveStartupCleanup() SIGKILLs all processes matching worker-service.cjs, including the hook process that spawned the daemon. The daemon kills its own parent before the hook can return output. Changes: - Add process.ppid exclusion to aggressiveStartupCleanup (ProcessManager.ts) - Remove stale Setup hook referencing deleted setup.sh (hooks.json) - Remove redundant start hook from SessionStart to avoid parallel race (hooks.json) - Reduce collectStdin timeout from 5s to 500ms (bun-runner.js) - Add waitForReadiness in hook handler to ensure DB is ready (worker-service.ts) - Downgrade MCP server cold-start log from error to info (mcp-server.ts) Fixes thedotmack#1426 Related: thedotmack#1423, thedotmack#1419, thedotmack#1410, thedotmack#1395 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughFixes multiple cold start issues preventing worker daemon initialization: removes redundant Setup hook and SessionStart startup command, shortens stdin wait timeout, prevents parent process termination in cleanup, adds explicit readiness verification in hook startup, and downgrades worker unavailability log severity. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugin/scripts/bun-runner.js (1)
142-147: Verify 500ms timeout is adequate for hook payloads in production, or add diagnostic logging.This timeout was intentionally reduced from 5000ms to address PostToolUse hook latency issues (
#1220). However, if Claude's hook system sends payloads larger than what arrives within 500ms, this could silently truncate partial data. The buffering strategy means the timeout acts as a hard cutoff regardless of whether data is still arriving.Consider adding optional logging when the timeout fires with partial data collected, so truncation issues become observable:
Optional: Add diagnostic logging for timeout with partial data
// Safety: if no data arrives within 500ms, proceed without stdin setTimeout(() => { + const hadPartialData = chunks.length > 0; process.stdin.removeAllListeners(); process.stdin.pause(); + if (hadPartialData) { + console.error(`[bun-runner] stdin timeout with ${Buffer.concat(chunks).length} bytes collected`); + } resolve(chunks.length > 0 ? Buffer.concat(chunks) : null); }, 500);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/scripts/bun-runner.js` around lines 142 - 147, The 500ms hard cutoff in the setTimeout block (the closure that calls process.stdin.removeAllListeners(), process.stdin.pause(), and resolve(chunks.length > 0 ? Buffer.concat(chunks) : null)) can silently truncate incoming hook payloads; make the timeout configurable (e.g., read a BUN_RUNNER_STDIN_TIMEOUT_MS env var or a constant with a sensible default of 500) and, when the timer fires and chunks.length > 0, emit a diagnostic warning including the number of bytes collected and the configured timeout to help debug truncation; keep the existing behavior if the env var isn’t set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugin/scripts/bun-runner.js`:
- Around line 142-147: The 500ms hard cutoff in the setTimeout block (the
closure that calls process.stdin.removeAllListeners(), process.stdin.pause(),
and resolve(chunks.length > 0 ? Buffer.concat(chunks) : null)) can silently
truncate incoming hook payloads; make the timeout configurable (e.g., read a
BUN_RUNNER_STDIN_TIMEOUT_MS env var or a constant with a sensible default of
500) and, when the timer fires and chunks.length > 0, emit a diagnostic warning
including the number of bytes collected and the configured timeout to help debug
truncation; keep the existing behavior if the env var isn’t set.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 17532823-a3f8-4612-996d-4ab944cf9420
📒 Files selected for processing (5)
plugin/hooks/hooks.jsonplugin/scripts/bun-runner.jssrc/servers/mcp-server.tssrc/services/infrastructure/ProcessManager.tssrc/services/worker-service.ts
💤 Files with no reviewable changes (1)
- plugin/hooks/hooks.json
Summary
Fixes multiple cold start issues that cause "SessionStart:startup hook error" and missing context banner on first launch after reboot.
Root Cause
aggressiveStartupCleanup()inProcessManager.tsSIGKILLs all processes whose command line matchesworker-service.cjs, including the hook process that spawned the daemon. The daemon kills its own parent before the hook can return output.Introduced in commit
d333c7d(v9.1.0). Was masked by the in-process worker fallback until v10.5.6 removed it (PR #1370).Changes
ProcessManager.tsprocess.ppidto PID exclusion listhooks.jsonSetuphooksetup.shwas deleted in v10.5.0 but hook entry remainedhooks.jsonstarthook from SessionStartcontexthook already callsensureWorkerStarted()internally; parallel execution causes "port in use" racebun-runner.jscollectStdintimeout from 5s to 500msworker-service.tswaitForReadiness()in hook handlermcp-server.tserrortoinfoTesting
Verified on macOS ARM64 (Apple Silicon) with Claude Code 2.1.80 and claude-mem v10.6.1:
Note:
plugin/scripts/worker-service.cjsandplugin/scripts/mcp-server.cjsare bundled outputs and not modified here. They will need to be rebuilt from the updated source.Related Issues
Fixes #1426
Related: #1423 #1419 #1410 #1395
🤖 Generated with Claude Code