Skip to content

fix: prevent cold start hook errors from aggressiveStartupCleanup killing parent process#1427

Open
sangfansh wants to merge 1 commit intothedotmack:mainfrom
sangfansh:fix/cold-start-hook-errors
Open

fix: prevent cold start hook errors from aggressiveStartupCleanup killing parent process#1427
sangfansh wants to merge 1 commit intothedotmack:mainfrom
sangfansh:fix/cold-start-hook-errors

Conversation

@sangfansh
Copy link

Summary

Fixes multiple cold start issues that cause "SessionStart:startup hook error" and missing context banner on first launch after reboot.

Root Cause

aggressiveStartupCleanup() in ProcessManager.ts SIGKILLs all processes whose command line matches worker-service.cjs, including the hook process that spawned the daemon. The daemon kills its own parent before the hook can return output.

Introduced in commit d333c7d (v9.1.0). Was masked by the in-process worker fallback until v10.5.6 removed it (PR #1370).

Changes

File Change Why
ProcessManager.ts Add process.ppid to PID exclusion list Prevents daemon from killing the hook that spawned it
hooks.json Remove stale Setup hook setup.sh was deleted in v10.5.0 but hook entry remained
hooks.json Remove redundant start hook from SessionStart context hook already calls ensureWorkerStarted() internally; parallel execution causes "port in use" race
bun-runner.js Reduce collectStdin timeout from 5s to 500ms SessionStart hooks don't receive stdin; 5s delay wastes startup time
worker-service.ts Add waitForReadiness() in hook handler Ensures DB is initialized before fetching context, preventing empty banner
mcp-server.ts Downgrade cold-start worker check from error to info Prevents stderr output from being surfaced as hook error by Claude Code

Testing

Verified on macOS ARM64 (Apple Silicon) with Claude Code 2.1.80 and claude-mem v10.6.1:

  • Before: "SessionStart:startup hook error" on every cold boot, no context banner, requires restart
  • After: Clean startup, context banner shows immediately, no errors

Note: plugin/scripts/worker-service.cjs and plugin/scripts/mcp-server.cjs are bundled outputs and not modified here. They will need to be rebuilt from the updated source.

Related Issues

Fixes #1426
Related: #1423 #1419 #1410 #1395

🤖 Generated with Claude Code

…cold start

Fixes multiple cold start issues that cause "SessionStart:startup hook error"
and missing context banner on first launch after reboot.

Root cause: aggressiveStartupCleanup() SIGKILLs all processes matching
worker-service.cjs, including the hook process that spawned the daemon.
The daemon kills its own parent before the hook can return output.

Changes:
- Add process.ppid exclusion to aggressiveStartupCleanup (ProcessManager.ts)
- Remove stale Setup hook referencing deleted setup.sh (hooks.json)
- Remove redundant start hook from SessionStart to avoid parallel race (hooks.json)
- Reduce collectStdin timeout from 5s to 500ms (bun-runner.js)
- Add waitForReadiness in hook handler to ensure DB is ready (worker-service.ts)
- Downgrade MCP server cold-start log from error to info (mcp-server.ts)

Fixes thedotmack#1426
Related: thedotmack#1423, thedotmack#1419, thedotmack#1410, thedotmack#1395

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved worker startup reliability with enhanced readiness verification.
    • Prevented accidental termination of parent process during cleanup operations.
  • Performance

    • Optimized input handling timeout for faster response times.
  • Improvements

    • Reduced unnecessary warning messages for better logging clarity.

Walkthrough

Fixes multiple cold start issues preventing worker daemon initialization: removes redundant Setup hook and SessionStart startup command, shortens stdin wait timeout, prevents parent process termination in cleanup, adds explicit readiness verification in hook startup, and downgrades worker unavailability log severity.

Changes

Cohort / File(s) Summary
Hook Configuration & Stdin Optimization
plugin/hooks/hooks.json, plugin/scripts/bun-runner.js
Removed stale Setup hook and the redundant start command from SessionStart hook; reduced collectStdin() timeout from 5000ms to 500ms to eliminate wasted startup time on stdin-less hook invocations.
Worker Startup Readiness
src/services/worker-service.ts
Added explicit waitForReadiness() call in the hook command handler to ensure database initialization completes before context fetching, preventing empty context output on cold start.
Process Cleanup Safety
src/services/infrastructure/ProcessManager.ts
Extended PID exclusion logic in aggressiveStartupCleanup() to skip parent process (process.ppid), preventing accidental termination of the hook process that spawned the daemon.
MCP Server Logging
src/servers/mcp-server.ts
Changed worker unavailability log from error-level (stderr) to info-level with softer messaging, preventing false hook errors reported through MCP stdio transport on cold start.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

Possibly related PRs

  • PR #16 — Modifies plugin/hooks/hooks.json and SessionStart hook configuration; this PR removes the worker start step while PR #16 replaces it with cross-platform npm setup, making them related refactoring efforts on the same hook structure.

Suggested reviewers

  • thedotmack

Poem

🐰 A daemon born from startup's tangled dance,
Parent processes given safe reprieve,
Stdin waits no more—just 500ms chance,
Readiness whispers what hooks must believe,
Cold starts bloom now, no errors to grieve! 🌱

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely identifies the main fix: preventing aggressiveStartupCleanup from killing the parent hook process, which is the core issue described in the PR.
Description check ✅ Passed The description comprehensively explains the root cause, all five file changes, and their rationales, directly addressing the linked issue #1426 and related problems.
Linked Issues check ✅ Passed The PR successfully implements all six coding requirements from issue #1426: (1) adds process.ppid exclusion in ProcessManager.ts [#1426], (2) removes stale Setup hook from hooks.json [#1426], (3) removes redundant SessionStart start hook [#1426], (4) reduces collectStdin timeout [#1426], (5) adds waitForReadiness in worker-service.ts [#1426], and (6) downgrades mcp-server log level [#1426].
Out of Scope Changes check ✅ Passed All changes directly address the cold-start issues documented in issue #1426; no unrelated modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugin/scripts/bun-runner.js (1)

142-147: Verify 500ms timeout is adequate for hook payloads in production, or add diagnostic logging.

This timeout was intentionally reduced from 5000ms to address PostToolUse hook latency issues (#1220). However, if Claude's hook system sends payloads larger than what arrives within 500ms, this could silently truncate partial data. The buffering strategy means the timeout acts as a hard cutoff regardless of whether data is still arriving.

Consider adding optional logging when the timeout fires with partial data collected, so truncation issues become observable:

Optional: Add diagnostic logging for timeout with partial data
    // Safety: if no data arrives within 500ms, proceed without stdin
    setTimeout(() => {
+      const hadPartialData = chunks.length > 0;
       process.stdin.removeAllListeners();
       process.stdin.pause();
+      if (hadPartialData) {
+        console.error(`[bun-runner] stdin timeout with ${Buffer.concat(chunks).length} bytes collected`);
+      }
       resolve(chunks.length > 0 ? Buffer.concat(chunks) : null);
     }, 500);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/scripts/bun-runner.js` around lines 142 - 147, The 500ms hard cutoff
in the setTimeout block (the closure that calls
process.stdin.removeAllListeners(), process.stdin.pause(), and
resolve(chunks.length > 0 ? Buffer.concat(chunks) : null)) can silently truncate
incoming hook payloads; make the timeout configurable (e.g., read a
BUN_RUNNER_STDIN_TIMEOUT_MS env var or a constant with a sensible default of
500) and, when the timer fires and chunks.length > 0, emit a diagnostic warning
including the number of bytes collected and the configured timeout to help debug
truncation; keep the existing behavior if the env var isn’t set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugin/scripts/bun-runner.js`:
- Around line 142-147: The 500ms hard cutoff in the setTimeout block (the
closure that calls process.stdin.removeAllListeners(), process.stdin.pause(),
and resolve(chunks.length > 0 ? Buffer.concat(chunks) : null)) can silently
truncate incoming hook payloads; make the timeout configurable (e.g., read a
BUN_RUNNER_STDIN_TIMEOUT_MS env var or a constant with a sensible default of
500) and, when the timer fires and chunks.length > 0, emit a diagnostic warning
including the number of bytes collected and the configured timeout to help debug
truncation; keep the existing behavior if the env var isn’t set.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17532823-a3f8-4612-996d-4ab944cf9420

📥 Commits

Reviewing files that changed from the base of the PR and between 9f529a3 and f5e0465.

📒 Files selected for processing (5)
  • plugin/hooks/hooks.json
  • plugin/scripts/bun-runner.js
  • src/servers/mcp-server.ts
  • src/services/infrastructure/ProcessManager.ts
  • src/services/worker-service.ts
💤 Files with no reviewable changes (1)
  • plugin/hooks/hooks.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aggressiveStartupCleanup SIGKILLs hook process on cold start, causing SessionStart hook error and empty context

1 participant