fix: use spawn+detach in tryStartDaemon to prevent daemon being killed on timeout#153
Conversation
|
PR author is not in the allowed authors list. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplace blocking spawnSync startup with detached, asynchronous child processes; add guarded local-dev startup that checks for Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FS as FileSystem (tldr-code)
participant UV as uv CLI
participant TLDR as tldr Daemon
participant Lock as Lock/Wait logic
Client->>FS: check for `tldr-code` path
alt local-dev path exists and TLDR_DEV not forcing global
Client->>UV: exec `uv --version` (check)
alt uv available
Client->>UV: spawn `uv run tldr daemon start` (detached, unref)
UV-->>Client: returns PID or spawn error
alt spawn OK
Client->>Lock: use lock & 10s wait loop to verify readiness
else spawn error
Client->>TLDR: spawn global `tldr daemon start` (detached, unref)
TLDR-->>Client: returns PID
Client->>Lock: wait/verify readiness
end
else uv unavailable
Client->>TLDR: spawn global `tldr daemon start` (detached, unref)
TLDR-->>Client: returns PID
Client->>Lock: wait/verify readiness
end
else no local-dev path or TLDR_DEV forces global
Client->>TLDR: spawn global `tldr daemon start` (detached, unref)
TLDR-->>Client: returns PID
Client->>Lock: wait/verify readiness
end
note right of Client: If local start attempted but daemon unreachable, try global detached start as last-resort
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
.claude/hooks/src/daemon-client.ts
Outdated
| }); | ||
| started = result.status === 0; | ||
| child.unref(); | ||
| started = true; |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
d77258c to
d26735d
Compare
marcuspuchalla
left a comment
There was a problem hiding this comment.
Valid bug — good catch. You're right that spawn() doesn't throw when the binary isn't found; it emits 'error' asynchronously. Since we call child.unref() immediately and this is a synchronous function, we can't attach an error listener that would influence started. The result: started = true even when uv isn't installed, so the tldr fallback never ran.
Fixed in the latest commit by checking uv availability synchronously before spawning:
try {
execSync('uv --version', { stdio: 'ignore', timeout: 2000 }); // exits immediately
const child = spawn('uv', ['run', 'tldr', 'daemon', 'start', ...], { detached: true, stdio: 'ignore' });
child.unref();
started = true; // only set if uv is actually available
} catch {
// uv not found — falls through to the global tldr fallback below
}uv --version exits in milliseconds so the 2s timeout is purely a safeguard.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.claude/hooks/src/daemon-client.ts (1)
14-14: Replace dynamicrequirewith a static import addition.
child_processis already imported at line 14. Obtainingspawnvia a second, dynamicrequireinside the function body is redundant — both resolve to the same cached module.♻️ Proposed refactor
-import { execSync } from 'child_process'; +import { execSync, spawn } from 'child_process';Then remove the inline dynamic require:
- const { spawn } = require('child_process') as typeof import('child_process'); -Also applies to: 391-391
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/hooks/src/daemon-client.ts at line 14, The code currently does a dynamic require('child_process').spawn inside functions even though child_process is statically imported at the top as import { execSync } from 'child_process'; — update the top-level import to pull spawn as well (e.g., import { execSync, spawn } from 'child_process') and remove the inline require(...) calls inside functions (references where spawn is obtained dynamically, e.g., the call site that previously did require('child_process').spawn around the spawn usage and the similar spot at line ~391); keep using the top-level spawn identifier wherever it's used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/hooks/src/daemon-client.ts:
- Around line 395-412: The code sets started = true immediately after spawning
the local 'uv run tldr' child (when tldrPath exists), which no longer reflects
whether the daemon actually launched and thus suppresses the global fallback;
change the logic in the block that spawns with spawn('uv', ...) so you only mark
started = true after verifying the daemon is running (e.g., call
isDaemonProcessRunning(projectDir) after a short delay or check that 'uv' is in
PATH before preferring the local spawn), and keep the TLDR_DEV guard behavior;
update the code around the spawn of 'uv' and the started flag so fallback to
spawn('tldr', ...) occurs if the verification fails.
- Around line 394-412: The spawned child processes started with spawn('uv',
['run', 'tldr', 'daemon', 'start', '--project', projectDir], { ... }) and
spawn('tldr', ['daemon', 'start', '--project', projectDir], { ... }) need
.on('error', ...) listeners to avoid an unhandled ENOENT crash; add a no-op or
logged error handler on each child before calling child.unref() (for the local
branch that checks existsSync(tldrPath) and for the fallback branch gated by
!process.env.TLDR_DEV) so missing executables don't terminate the hook.
---
Nitpick comments:
In @.claude/hooks/src/daemon-client.ts:
- Line 14: The code currently does a dynamic require('child_process').spawn
inside functions even though child_process is statically imported at the top as
import { execSync } from 'child_process'; — update the top-level import to pull
spawn as well (e.g., import { execSync, spawn } from 'child_process') and remove
the inline require(...) calls inside functions (references where spawn is
obtained dynamically, e.g., the call site that previously did
require('child_process').spawn around the spawn usage and the similar spot at
line ~391); keep using the top-level spawn identifier wherever it's used.
d26735d to
282159a
Compare
marcuspuchalla
left a comment
There was a problem hiding this comment.
All three points addressed in the latest commit:
-
Static import:
spawnis now imported at the top level alongsideexecSync— the inlinerequire('child_process')inside the function has been removed. -
startedflag: already fixed in the previous commit — we useexecSync('uv --version')to check uv availability synchronously before spawning, sostarted = trueis only set when uv is confirmed present. -
Unhandled ENOENT: added
.on('error', () => {})handlers on both the uv and tldr child processes beforechild.unref(). The error comment notes what each handler suppresses — callers degrade gracefully via the existing socket-reachability checks.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/hooks/src/daemon-client.ts:
- Around line 396-408: The block that spawns the local uv/tldr process sets
started = true even when the process fails immediately (so global fallback is
suppressed); fix this by checking child.pid synchronously after spawn and only
set started = true when child.pid is defined (otherwise leave started false and
don't call child.unref()), and also attach an async handler on child.on('error',
(err) => { started = false; }) to flip the flag if spawn-level errors occur
later; refer to execSync, spawn, child, started, tldrPath and projectDir to
locate and update the code.
282159a to
22fc56f
Compare
marcuspuchalla
left a comment
There was a problem hiding this comment.
Good catch — this is a real bug and the reviewer's description is accurate. Fixed in the latest commit with two layers of defense:
1. child.pid guard + async error handler (covers spawn-level ENOENT):
if (child.pid \!== undefined) {
child.on('error', () => { started = false; }); // flip back on late ENOENT
child.unref();
started = true;
}
// If child.pid is undefined, OS rejected the spawn — falls through to fallback2. Post-wait-loop last-resort fallback (covers uv run tldr exiting non-zero):
When uv is in PATH but tldr is not installed in the uv environment, uv run tldr starts (child.pid is defined) but exits immediately with non-zero. No error event fires, no PID file is written. After the 10s wait loop times out with the daemon unreachable, we now try the global tldr as a last resort:
if (started && \!isDaemonReachable(projectDir) && \!process.env.TLDR_DEV) {
const child = spawn('tldr', ['daemon', 'start', '--project', projectDir], { detached: true, stdio: 'ignore' });
child.on('error', () => {});
child.unref();
}This breaks the repeated-10s-spin loop: if uv fails silently, the global tldr gets a chance to start the daemon, and the next invocation will find it running.
|
Re the remaining inline comments from Sentry and CodeRabbit — all addressed in the latest commit: Sentry / CodeRabbit (lines 412, 423):
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.claude/hooks/src/daemon-client.ts (2)
444-453: Last-resort spawn returns immediately with no wait — always fails the current request.After spawning the global
tldrat line 445,isDaemonReachable(projectDir)is called at line 453 with zero delay. The freshly-spawned daemon will virtually never be ready in time, so this path always returnsfalse, making callers report{ status: 'unavailable' }for the current invocation. The daemon will be available for the next request, but the current one is wasted.Contrast with the first global fallback (line 417), which is immediately followed by the full 10-second wait loop (lines 426–437). The last-resort gets 0 seconds.
♻️ Proposed fix — add a short wait loop after the last-resort spawn
if (started && !isDaemonReachable(projectDir) && !process.env.TLDR_DEV) { const child = spawn('tldr', ['daemon', 'start', '--project', projectDir], { detached: true, stdio: 'ignore', }); child.on('error', () => {}); child.unref(); + + // Give the global daemon a brief window to become reachable + const lrStart = Date.now(); + while (Date.now() - lrStart < 3000) { + if (isDaemonReachable(projectDir)) break; + const wait = Date.now() + 100; + while (Date.now() < wait) { /* spin */ } + } } return isDaemonReachable(projectDir);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/hooks/src/daemon-client.ts around lines 444 - 453, The spawn fallback currently starts a detached tldr daemon (spawn(...)) then immediately returns isDaemonReachable(projectDir) which will almost always be false; after creating the child process in the last-resort block (the code using spawn('tldr', ['daemon', 'start', '--project', projectDir]) with child.unref()), add a short wait/retry loop (similar to the earlier 10s loop used for the first global fallback) that polls isDaemonReachable(projectDir) with small delays (e.g., 100–500ms) for a few seconds before returning; keep the child.on('error', () => {}) and child.unref() behavior and respect the existing process.env.TLDR_DEV guard.
403-408:started = falsein the async error handler is unreachable dead code — comment misleads future maintainers.The spin-loop wait below (lines 427–437) never yields the event loop, so the
'error'callback at line 405 cannot fire beforetryStartDaemonreturns. Once the function returns,startedis out of scope and setting it tofalsehas no observable effect on any code path. Thechild.pidguard (line 403) already handles the only synchronous spawn-failure case this handler was meant to cover.♻️ Proposed cleanup — replace with a no-op handler and clarify the comment
if (child.pid !== undefined) { - // Flip started back if uv fails at the spawn level after fork (e.g. ENOENT race) - child.on('error', () => { started = false; }); + // Suppress unhandled 'error' event if uv disappears between version-check and spawn. + // Note: spin-loop below prevents this from firing before the function returns, + // so this is purely an unhandled-event guard, not a control-flow signal. + child.on('error', () => { /* uv disappeared between check and spawn */ }); child.unref(); started = true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/hooks/src/daemon-client.ts around lines 403 - 408, The 'error' handler on the ChildProcess sets started = false but that assignment is unreachable because the spin-loop in tryStartDaemon prevents the event loop from running; replace the handler body with a no-op (e.g., child.on('error', () => {})) or a simple log so we don't reference the out-of-scope started variable, and update the comment above the handler to explain that the spin-loop blocks async event callbacks and that child.pid guard already handles synchronous spawn failures; touch the child.on('error', ...), started variable, and tryStartDaemon/spin-loop locations to implement this cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.claude/hooks/src/daemon-client.ts:
- Around line 444-453: The spawn fallback currently starts a detached tldr
daemon (spawn(...)) then immediately returns isDaemonReachable(projectDir) which
will almost always be false; after creating the child process in the last-resort
block (the code using spawn('tldr', ['daemon', 'start', '--project',
projectDir]) with child.unref()), add a short wait/retry loop (similar to the
earlier 10s loop used for the first global fallback) that polls
isDaemonReachable(projectDir) with small delays (e.g., 100–500ms) for a few
seconds before returning; keep the child.on('error', () => {}) and child.unref()
behavior and respect the existing process.env.TLDR_DEV guard.
- Around line 403-408: The 'error' handler on the ChildProcess sets started =
false but that assignment is unreachable because the spin-loop in tryStartDaemon
prevents the event loop from running; replace the handler body with a no-op
(e.g., child.on('error', () => {})) or a simple log so we don't reference the
out-of-scope started variable, and update the comment above the handler to
explain that the spin-loop blocks async event callbacks and that child.pid guard
already handles synchronous spawn failures; touch the child.on('error', ...),
started variable, and tryStartDaemon/spin-loop locations to implement this
cleanup.
22fc56f to
b96014b
Compare
|
Both points in this review are correct and fixed in the latest commit: 1. Last-resort spawn had zero wait (lines 444–453) 2. |
.claude/hooks/src/daemon-client.ts
Outdated
| if (started && !isDaemonReachable(projectDir) && !process.env.TLDR_DEV) { | ||
| const child = spawn('tldr', ['daemon', 'start', '--project', projectDir], { | ||
| detached: true, | ||
| stdio: 'ignore', | ||
| }); | ||
| child.on('error', () => {}); | ||
| child.unref(); | ||
|
|
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
b96014b to
cb95197
Compare
|
Re Sentry comment on lines 446-453: valid race condition, fixed. The last-resort condition now guards with |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.claude/hooks/src/daemon-client.ts (1)
441-462: Last-resort logic is sound; considerwindowsHide: truefor Windows targets.The condition correctly relies on the fact that
isDaemonReachableinternally callsisDaemonProcessRunning, so a slowly-starting daemon that has already written its PID file returnstruefromisDaemonReachableand skips the last-resort. The 3 s post-start wait keeps the lock held for at most 13 s, well within the 30 s stale-lock threshold.One optional improvement: all three
spawncalls in this function (uvat line 398, globaltldrat line 419, and the last-resort at line 447) are missingwindowsHide: true. SincegetConnectionInfoincludes a Windows TCP path this codebase does target Windows, and withoutwindowsHide: true,detached: truewill briefly flash a console window when the hook fires.♻️ Proposed change (apply to all three spawn option objects)
const child = spawn('tldr', ['daemon', 'start', '--project', projectDir], { detached: true, stdio: 'ignore', + windowsHide: true, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/hooks/src/daemon-client.ts around lines 441 - 462, The three child process spawns that start the daemon (the spawn(...) that runs 'uv' and the two spawn(...) calls that run 'tldr' — the global start and this last-resort start) currently set detached: true and stdio: 'ignore' but omit windowsHide: true; add windowsHide: true to each of those spawn option objects so detached background launches won't flash a Windows console window (preserve detached and stdio settings).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/hooks/src/daemon-client.ts:
- Around line 403-410: The branch in tryStartDaemon registers child.on('error',
...) only when child.pid !== undefined, which leaves an unhandled 'error' event
if spawn fails and pid is undefined; change the logic to always attach the
'error' listener to the spawned ChildProcess (the variable child) before
checking child.pid so spawn failures are caught, then keep the existing
conditional child.unref()/started assignment when pid exists; ensure the same
error handler used by the global fallback is applied to child in the
tryStartDaemon code path.
---
Nitpick comments:
In @.claude/hooks/src/daemon-client.ts:
- Around line 441-462: The three child process spawns that start the daemon (the
spawn(...) that runs 'uv' and the two spawn(...) calls that run 'tldr' — the
global start and this last-resort start) currently set detached: true and stdio:
'ignore' but omit windowsHide: true; add windowsHide: true to each of those
spawn option objects so detached background launches won't flash a Windows
console window (preserve detached and stdio settings).
| if (child.pid !== undefined) { | ||
| // Suppress unhandled 'error' event if uv disappears between version-check and spawn. | ||
| // Note: the spin-loop below blocks the event loop, so this fires only after | ||
| // tryStartDaemon returns — child.pid guard already covers sync spawn failures. | ||
| child.on('error', () => { /* uv disappeared between check and spawn */ }); | ||
| child.unref(); | ||
| started = true; | ||
| } |
There was a problem hiding this comment.
Unhandled error event when child.pid is undefined — hook process will crash.
subprocess.pid is undefined when the child fails to spawn, and the error event is emitted asynchronously in that case. Because the child.on('error', ...) handler is only registered inside the if (child.pid !== undefined) branch, a spawn failure (e.g., the TOCTOU window where uv disappears between execSync('uv --version') and spawn) will emit an error event with no listener. A synchronous try/catch cannot intercept this: the error propagates as an unhandled EventEmitter event and terminates the process. The blocking spin-loop below means the event fires after tryStartDaemon returns, but before Node.js exits — so the crash still lands in the hook's process lifetime.
The global fallback at line 423 and the last-resort at line 451 both attach the error handler unconditionally — apply the same pattern here:
🐛 Proposed fix
const child = spawn('uv', ['run', 'tldr', 'daemon', 'start', '--project', projectDir], {
detached: true,
stdio: 'ignore',
cwd: tldrPath,
});
+ // Must be registered unconditionally; pid guard below doesn't cover the
+ // TOCTOU case where uv disappears between the version check and spawn.
+ child.on('error', () => { /* uv disappeared between check and spawn */ });
if (child.pid !== undefined) {
- // Suppress unhandled 'error' event if uv disappears between version-check and spawn.
- // Note: the spin-loop below blocks the event loop, so this fires only after
- // tryStartDaemon returns — child.pid guard already covers sync spawn failures.
- child.on('error', () => { /* uv disappeared between check and spawn */ });
child.unref();
started = true;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/hooks/src/daemon-client.ts around lines 403 - 410, The branch in
tryStartDaemon registers child.on('error', ...) only when child.pid !==
undefined, which leaves an unhandled 'error' event if spawn fails and pid is
undefined; change the logic to always attach the 'error' listener to the spawned
ChildProcess (the variable child) before checking child.pid so spawn failures
are caught, then keep the existing conditional child.unref()/started assignment
when pid exists; ensure the same error handler used by the global fallback is
applied to child in the tryStartDaemon code path.
…ng killed
spawnSync kills the child process when the timeout expires. The TLDR daemon
takes 30-60s to index a large project, so on timeout spawnSync sends SIGTERM,
killing the daemon. The next Claude session start finds no running daemon and
spawns another — repeat across sessions = dozens of orphaned 3GB processes.
Replace spawnSync with spawn({ detached: true }) + child.unref() so the daemon
runs as a true independent background process and survives after the hook exits.
The existing PID guard (isDaemonProcessRunning) already prevents duplicate
spawns when the PID file exists and the process is alive. This fix ensures the
daemon isn't killed immediately after being spawned, so the PID file stays valid.
cb95197 to
15971f4
Compare
|
Re CodeRabbit review 3838094812 — both issues fixed in the latest commit: Inline (line 410): unhandled Nitpick: |
Problem
tryStartDaemonusesspawnSyncto start the TLDR daemon:spawnSyncwith a timeout kills the child process when it expires. The TLDR daemon takes 30–60s to index a large project — much longer than the 5–10s timeout. So the daemon is killed before it finishes starting, and the next Claude session finds no running daemon and spawns another one. With Continuous Claude running multiple auto-restarting sessions, this compounds into dozens of orphaned 3GB processes → system freeze.The existing PID guard (
isDaemonProcessRunning) correctly prevents duplicate spawns when the daemon is alive — but can't help ifspawnSyncalready killed it.Fix
Replace
spawnSyncwithspawn({ detached: true })+child.unref()so the daemon runs as a true independent background process and is not killed when the hook process exits or times out:Impact
Summary by CodeRabbit