Skip to content

fix: reduce detach latency and stabilize detach path#324

Merged
ethanpailes merged 8 commits intoshell-pool:masterfrom
tobwen:fix/detach-latency
Mar 10, 2026
Merged

fix: reduce detach latency and stabilize detach path#324
ethanpailes merged 8 commits intoshell-pool:masterfrom
tobwen:fix/detach-latency

Conversation

@tobwen
Copy link
Contributor

@tobwen tobwen commented Mar 2, 2026

Summary

Fixes #323, detach latency by making detach path responsive with faster polling and adaptive backoff in client protocol loop.

Changes

  • Reduced JOIN_POLL_DURATION from 100ms to 10ms in libshpool/src/consts.rs for faster thread join polling
  • Reduced SHELL_TO_CLIENT_POLL_MS from 100 to 10 in libshpool/src/daemon/shell.rs for faster detach/reattach detection
  • Added adaptive backoff with exponential step (1ms initial, capped at 25ms) in client protocol loop
  • Added fast path for TTY detach (10ms) vs. slow path fallback (300ms max)
  • Exit client->shell thread immediately after detach action in libshpool/src/daemon/shell.rs
  • Refactored polling loop in libshpool/src/protocol.rs to check stop flag during sleep intervals

Behavior

Before this fix

  • Detach was slow (100 to 300ms polling intervals)
  • Threads could stay blocked after detach waiting on poll cycles
  • Users experienced noticeable delay when detaching

After this fix

  • Responsive detach with 10ms fast-path for TTY sessions
  • Immediate exit after detach (no more waiting for poll cycles)
  • Adaptive backoff ensures compatibility with existing shutdown paths
  • CPU usage controlled through bounded backoff (max 25ms step)

Rationale

The previous polling intervals (100ms) were too conservative, causing noticeable latency in the detach operation. By reducing polling to 10ms and implementing adaptive backoff, detach becomes responsive while maintaining compatibility with existing shutdown flows and avoiding busy waits.

Tests

No tests have been added or modified.

@tobwen
Copy link
Contributor Author

tobwen commented Mar 3, 2026

Oh come on - formatting of a comment? Really? 😄 I'll lookup the correct formatting rules later.

@tobwen tobwen force-pushed the fix/detach-latency branch 2 times, most recently from 1013438 to b689d04 Compare March 3, 2026 19:54
@tobwen tobwen force-pushed the fix/detach-latency branch from b689d04 to af4a3cd Compare March 3, 2026 21:14
Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start. I left some feedback on things we can to do reuse code a bit.

}

thread::sleep(consts::HEARTBEAT_DURATION);
let mut slept = time::Duration::ZERO;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bundle this logic up into a helper function that takes a stop predicate function (in this case it would be parameterized with a closure that checks the stop atomic), a poll strategy enum, a sleep duration. It seems like a generically useful pattern. We could put it in libshpool/src/common.rs

The poll strategy enum can have one option for uniform polling with a given duration (which we would use here) and another for exponential backoff with the usual exponential backoff params (initial poll dur, backoff factor, max poll dur).

we could call it sleep_unless or something like that.

if sock_to_stdout_h.is_finished() {
info!("recheck: sock->stdout thread done");
nfinished_threads += 1;
// Fast-path: when server->client already ended (detach/disconnect),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the thread names consistant and call this sock->stdout

let mut stdin_done = stdin_to_sock_h.is_finished();
let mut stdout_done = sock_to_stdout_h.is_finished();

let stdin_is_tty = isatty(io::stdin()).unwrap_or(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like to avoid isatty when possible since it makes it harder to predict how a tool will work when running under a script. Sometimes it is worthwhile, but in this case I don't think it is worth having divergant behavior. I don't see a reason we would need to wait around longer in a script context, so let's just always use the short timeout if the daemon hangs up on us. We should re-name the constant to avoid mentioning TTY when we do this.

thread::sleep(consts::HEARTBEAT_DURATION);
let mut slept = time::Duration::ZERO;
let sleep_step = consts::JOIN_POLL_DURATION;
while slept < consts::HEARTBEAT_DURATION {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than incrementing a counter of total duration slept, these sorts of loops should instead set a specific deadline in the future and always compare the current time against that. This avoid lock drift because thread::sleep(d) does not alwasy take exactly d. These little mismatches can add up over time. Not really a huge deal, but it doesn't hurt to be as precice as possible.

}

let remaining = consts::HEARTBEAT_DURATION - slept;
let step = if remaining < sleep_step { remaining } else { sleep_step };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use min the way you do below. Also, we probably don't need an explicit named variable for remaining, we can just inline it into the min params for brevity.

MAX_DETACH_WAIT_DUR
};

loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop can switch to using the helper I suggested above, with its stop predicate computing nfinished_threads and checking if the count is >= 2.

@tobwen
Copy link
Contributor Author

tobwen commented Mar 3, 2026

Phew, that was way more work than I expected. Hopefully I haven't overlooked anything or made it more complicated than planned. And - of course - I missed one 😅

Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, just one last nit about the new sleep API

/// Poll at a fixed interval.
Uniform { interval: time::Duration },
/// Poll with exponential backoff up to a maximum interval.
Backoff { initial_interval: time::Duration, factor: u32, max_interval: time::Duration },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a float for the factor. The most common backoff factors are 2 and 1.5, and if we use an int we can't handle 1.5.

Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched factor to a float.

@ethanpailes ethanpailes merged commit e9f8a57 into shell-pool:master Mar 10, 2026
7 checks passed
@release-plz-for-shpool release-plz-for-shpool bot mentioned this pull request Mar 3, 2026
@tobwen
Copy link
Contributor Author

tobwen commented Mar 10, 2026

Sorry, I didn’t get around to it and didn’t want to submit something half-baked. It’s great that you made the changes! Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HAS FIX] Improve detach latency and responsivenes

2 participants