Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(s2n-quic-dc): only poll accepted streams that are ready #2409

Merged
merged 4 commits into from
Dec 11, 2024

Conversation

camshaft
Copy link
Contributor

@camshaft camshaft commented Dec 7, 2024

Description of changes:

The TCP currently polls all pending streams on every task wakeup:

self.working.retain(|&idx| {
let worker = &mut self.workers[idx];
let Poll::Ready(res) = worker.poll(cx, worker_cx, now, publisher) else {

This isn't super efficient since only a handful of streams may be ready to be polled again.

This change, instead, adds a waker::Set which hands out specific wakers to each stream and then tracks which ones actually woke up. It then only polls those tasks on wakeup, skipping the still-pending ones.

Call-outs:

I refactored the TCP acceptor a bit to make it easier to test, especially write some fuzz tests for since the management logic is getting quite complicated.

Testing:

I added quite a few tests. Most interesting one is probably the fuzz test for the accept manager, which ensures all struct invariants are kept no matter which order events occur:

fn invariants_test() {
check!().with_type::<Vec<Op>>().for_each(|ops| {
let mut harness = Harness::default();
for op in ops {
match op {
Op::Insert => {
harness.insert();
}
Op::Wake { idx } => {
harness.wake(*idx);
}
Op::Ready { idx, error } => {
if *error {
harness.error(*idx, io::ErrorKind::ConnectionReset);
} else {
harness.ready(*idx);
}
}
Op::Advance { millis } => {
harness.advance(Duration::from_millis(*millis as u64));
harness.poll();
}
}
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@camshaft camshaft force-pushed the camshaft/dc-waker-set branch 4 times, most recently from 61e1b28 to e43dce3 Compare December 10, 2024 00:58
@camshaft camshaft marked this pull request as ready for review December 10, 2024 16:53
is_active
});
// we did a full scan so reset the value
self.gc_count = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This retains feels possibly expensive - it's going to be near random in the common case I think, so lots of shifting/copying of the indices. Plus still O(n) for worker count.

If we don't do this at all, I think the downside is that when evicting due to sojourn we would need to search for the right entry? I think just popping isn't enough, but maybe a heap which we use to incrementally sort ends up cheaper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's tough to say. I think it's definitely better than what we're doing today, but let me try refactoring to use a heap instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e237253. I ended up going with a "linked list" structure (reusing the existing workers allocation) that made push/pop/remove all O(1).

/// Registers a waker with the given ID
pub fn waker(&mut self, id: usize) -> Waker {
// reserve space in the locally ready set
self.ready.resize_for_id(id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't our size fixed? We manage to fix the worker set so it feels surprising we can't do that here. I think that would eliminate the locking entirely...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed after we create all of the Wakers. This just made it so you don't have to specify the capacity up front. In the critical path, this isn't used at all, since those wakers are cached in the slot.

let shift = self.shift;
let id = self.index * SLOT_SIZE + shift;
let mask = 1 << shift;
self.shift += 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks linear to me - I would use trailing_zeros to skip to next 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6f9795c

/// * `entries` is only managed by [`List`]
/// * `idx` is less than `usize::MAX`
#[inline]
pub unsafe fn pop<L>(&mut self, entries: &mut [L]) -> Option<usize>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the bounds check avoidance is very unlikely to meaningfully contribute performance and the loss of safety feels not worth it as such. Can we start with making this safe (panicking if we have issues)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep i'll fix it

@camshaft
Copy link
Contributor Author

Unrelated test failures are fixed in #2412

@camshaft camshaft merged commit fc39fc2 into main Dec 11, 2024
108 of 127 checks passed
@camshaft camshaft deleted the camshaft/dc-waker-set branch December 11, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants