Skip to content

perf: process kill wait loop and PID polling jitter#627

Merged
Wirasm merged 4 commits intomainfrom
kild/perf-optimizations
Mar 17, 2026
Merged

perf: process kill wait loop and PID polling jitter#627
Wirasm merged 4 commits intomainfrom
kild/perf-optimizations

Conversation

@Wirasm
Copy link
Copy Markdown
Owner

@Wirasm Wirasm commented Mar 4, 2026

Summary

  • Add best-effort wait after kill() — polls for up to 500ms to verify process termination, logs timeout if process survives SIGKILL
  • Add +/-20% PID-based jitter to PID file polling interval to decorrelate simultaneous kild create launches

Items 1 (Cow newtypes), 2 (git caching — each call opens a different worktree path), 3 (current_thread runtime — reverted, daemon runs fine on multi-threaded), and 6 (HashMap keys are tiny pane IDs) from #478 were evaluated and skipped as YAGNI for this codebase.

Closes #478

Test plan

  • cargo fmt --check passes
  • cargo clippy --all -- -D warnings passes
  • cargo test --all passes (2 pre-existing failures in resolve_self_branch unrelated to this PR)
  • Manual test: kild create + kild stop to verify process cleanup
  • Manual test: concurrent kild create commands to verify jitter prevents lock contention

Wirasm added 2 commits March 4, 2026 17:24
- Add process.wait() after kill() to verify termination and prevent
  zombie processes
- Add +/-20% random jitter to PID file polling interval to prevent
  thundering herd when multiple kild create commands run simultaneously
- Switch daemon from multi-threaded tokio runtime to current_thread
  since it's I/O-bound with low concurrency (PTY reads + IPC)

Closes #478
- Replace process.wait() with bounded 500ms poll loop to avoid blocking
  indefinitely on uninterruptible sleep
- Use PID-based jitter instead of SystemTime nanos which are correlated
  across simultaneous launches
- Revert current_thread runtime: daemon multiplexes PTYs + IPC across
  multiple sessions, single-threaded would stall all sessions on any
  slow operation
@Wirasm
Copy link
Copy Markdown
Owner Author

Wirasm commented Mar 17, 2026

PR Review Summary

Critical Issues (1 found)

Agent Issue Location
code-reviewer PR title/description claims "daemon runtime" change that isn't in the diff. The current_thread tokio change was added then reverted within the branch, but the PR title and body still advertise it. Anyone reading git history will believe the daemon runtime was changed when it wasn't. PR metadata

Fix: Remove the third summary bullet ("Switch daemon from multi-threaded tokio runtime…") and retitle to e.g. perf: process kill wait loop and PID polling jitter.


Important Issues (2 found)

Agent Issue Location
code-reviewer is_process_running allocates a new System::new() on every poll iteration — up to 50 times in the kill wait loop (500ms / 10ms). The system variable is already live in kill_process scope; reuse it with system.refresh_processes() directly. operations.rs:119-126
silent-failure-hunter Kill wait loop has no logging for timeout or error paths. unwrap_or(false) silently swallows errors (treats them as "still running", spins for full 500ms). If the process doesn't die within 500ms, Ok(()) is returned with no diagnostic trace — callers log "kill completed" while the process may still be alive. operations.rs:117-130

Suggested fix for both (combined):

// Reuse the already-populated `system` to poll for termination
let mut process_exited = false;
let deadline = std::time::Instant::now() + std::time::Duration::from_millis(500);
while std::time::Instant::now() < deadline {
    system.refresh_processes(ProcessesToUpdate::Some(&[pid_obj]), true);
    match system.process(pid_obj) {
        None => { process_exited = true; break; }
        Some(_) => {}
    }
    std::thread::sleep(std::time::Duration::from_millis(10));
}

if !process_exited {
    debug!(
        event = "core.process.kill_wait_timeout",
        pid = pid,
        message = "process did not exit within 500ms after SIGKILL"
    );
}

This (a) avoids up to 50 redundant System::new() allocations, (b) eliminates the unwrap_or(false) silent swallow, and (c) logs the timeout path.


Suggestions (2 found)

Agent Suggestion Location
comment-analyzer Docstring uses "thundering herd" but inline comment uses "decorrelate" — align terminology. "Decorrelate" is more precise for PID-based jitter. pid_file.rs:43-45
silent-failure-hunter Jitter is computed inside the loop but never varies (PID is constant). Hoist poll_interval_ms above the loop to reduce cognitive overhead. pid_file.rs:87-91

Strengths

  • kill() failure is properly surfaced as ProcessError::KillFailed before the wait loop runs
  • Both callers (stop.rs, destroy.rs) handle all three result cases explicitly — no silent swallowing at callsites
  • read_pid_file_with_retry properly surfaces the last polling error via warn! at timeout
  • Inline jitter comment correctly identifies the mechanism as PID-based decorrelation

Verdict: NEEDS FIXES

  1. Fix PR title/description — remove the phantom daemon runtime claim
  2. Reuse system in wait loop and add timeout logging — both are simple, address real issues
  3. Consider terminology and loop hoisting suggestions

- Reuse existing `system` in kill wait loop instead of allocating
  a new System::new() on each of up to 50 poll iterations
- Add debug log when kill wait times out (process didn't exit
  within 500ms after SIGKILL)
- Hoist constant jitter calculation above the polling loop
- Align docstring to use "decorrelate" instead of "thundering herd"
@Wirasm Wirasm changed the title perf: process kill retry, PID jitter, daemon runtime perf: process kill wait loop and PID polling jitter Mar 17, 2026
- Replace exited flag with early return on process exit
- Use elapsed() pattern consistent with read_pid_file_with_retry
- Remove unnecessary signed-integer casts in jitter computation —
  stays in u64 since BASE_INTERVAL_MS (100) > JITTER_RANGE_MS (20)
@Wirasm Wirasm merged commit 4db9aaa into main Mar 17, 2026
6 checks passed
@Wirasm Wirasm deleted the kild/perf-optimizations branch March 17, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: minor optimizations — string cloning, git caching, process kill retry, PID jitter

1 participant