feat(s2n-quic-dc): only poll accepted streams that are ready #2409

camshaft · 2024-12-07T00:18:10Z

Description of changes:

The TCP currently polls all pending streams on every task wakeup:

s2n-quic/dc/s2n-quic-dc/src/stream/server/tokio/tcp.rs

Lines 333 to 335 in e4a2365

    
           self.working.retain(|&idx| { 
        
               let worker = &mut self.workers[idx]; 
        
               let Poll::Ready(res) = worker.poll(cx, worker_cx, now, publisher) else {

This isn't super efficient since only a handful of streams may be ready to be polled again.

This change, instead, adds a waker::Set which hands out specific wakers to each stream and then tracks which ones actually woke up. It then only polls those tasks on wakeup, skipping the still-pending ones.

Call-outs:

I refactored the TCP acceptor a bit to make it easier to test, especially write some fuzz tests for since the management logic is getting quite complicated.

Testing:

I added quite a few tests. Most interesting one is probably the fuzz test for the accept manager, which ensures all struct invariants are kept no matter which order events occur:

s2n-quic/dc/s2n-quic-dc/src/stream/server/tokio/tcp/manager.rs

Lines 617 to 641 in e43dce3

    
           fn invariants_test() { 
        
               check!().with_type::<Vec<Op>>().for_each(|ops| { 
        
                   let mut harness = Harness::default(); 
        
                   for op in ops { 
        
                       match op { 
        
                           Op::Insert => { 
        
                               harness.insert(); 
        
                           } 
        
                           Op::Wake { idx } => { 
        
                               harness.wake(*idx); 
        
                           } 
        
                           Op::Ready { idx, error } => { 
        
                               if *error { 
        
                                   harness.error(*idx, io::ErrorKind::ConnectionReset); 
        
                               } else { 
        
                                   harness.ready(*idx); 
        
                               } 
        
                           } 
        
                           Op::Advance { millis } => { 
        
                               harness.advance(Duration::from_millis(*millis as u64)); 
        
                               harness.poll(); 
        
                           } 
        
                       } 
        
                   }

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Mark-Simulacrum · 2024-12-10T17:09:01Z

dc/s2n-quic-dc/src/stream/server/tokio/tcp/manager.rs

+                is_active
+            });
+            // we did a full scan so reset the value
+            self.gc_count = 0;


This retains feels possibly expensive - it's going to be near random in the common case I think, so lots of shifting/copying of the indices. Plus still O(n) for worker count.

If we don't do this at all, I think the downside is that when evicting due to sojourn we would need to search for the right entry? I think just popping isn't enough, but maybe a heap which we use to incrementally sort ends up cheaper?

Yeah it's tough to say. I think it's definitely better than what we're doing today, but let me try refactoring to use a heap instead.

Fixed in e237253. I ended up going with a "linked list" structure (reusing the existing workers allocation) that made push/pop/remove all O(1).

Mark-Simulacrum · 2024-12-10T17:10:48Z

dc/s2n-quic-dc/src/task/waker/set.rs

+    /// Registers a waker with the given ID
+    pub fn waker(&mut self, id: usize) -> Waker {
+        // reserve space in the locally ready set
+        self.ready.resize_for_id(id);


Isn't our size fixed? We manage to fix the worker set so it feels surprising we can't do that here. I think that would eliminate the locking entirely...

It is fixed after we create all of the Wakers. This just made it so you don't have to specify the capacity up front. In the critical path, this isn't used at all, since those wakers are cached in the slot.

Mark-Simulacrum · 2024-12-10T17:15:54Z

dc/s2n-quic-dc/src/task/waker/set/bitset.rs

+                let shift = self.shift;
+                let id = self.index * SLOT_SIZE + shift;
+                let mask = 1 << shift;
+                self.shift += 1;


This looks linear to me - I would use trailing_zeros to skip to next 1.

Yeah I'll fix it

Fixed in 6f9795c

Mark-Simulacrum · 2024-12-11T17:33:34Z

dc/s2n-quic-dc/src/stream/server/tokio/tcp/manager/list.rs

+    /// * `entries` is only managed by [`List`]
+    /// * `idx` is less than `usize::MAX`
+    #[inline]
+    pub unsafe fn pop<L>(&mut self, entries: &mut [L]) -> Option<usize>


I feel like the bounds check avoidance is very unlikely to meaningfully contribute performance and the loss of safety feels not worth it as such. Can we start with making this safe (panicking if we have issues)?

yep i'll fix it

camshaft · 2024-12-11T18:08:10Z

Unrelated test failures are fixed in #2412

camshaft force-pushed the camshaft/dc-waker-set branch 4 times, most recently from 61e1b28 to e43dce3 Compare December 10, 2024 00:58

camshaft marked this pull request as ready for review December 10, 2024 16:53

camshaft requested a review from Mark-Simulacrum December 10, 2024 16:53

Mark-Simulacrum reviewed Dec 10, 2024

View reviewed changes

Mark-Simulacrum previously approved these changes Dec 11, 2024

View reviewed changes

camshaft added 4 commits December 11, 2024 10:43

feat(s2n-quic-dc): only poll accepted streams that are ready

8df1474

use more efficient list impl

a7401b9

use trailing_zeros for iter impl

8e720c1

use checked indexing

2b3b9c1

camshaft dismissed Mark-Simulacrum’s stale review via 2b3b9c1 December 11, 2024 17:48

camshaft force-pushed the camshaft/dc-waker-set branch from 6f9795c to 2b3b9c1 Compare December 11, 2024 17:48

camshaft merged commit fc39fc2 into main Dec 11, 2024
108 of 127 checks passed

camshaft deleted the camshaft/dc-waker-set branch December 11, 2024 18:08

camshaft mentioned this pull request Dec 11, 2024

fix(s2n-quic-dc): use wake_forced for worker::Waker #2415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(s2n-quic-dc): only poll accepted streams that are ready #2409

feat(s2n-quic-dc): only poll accepted streams that are ready #2409

camshaft commented Dec 7, 2024 •

edited

Loading

Mark-Simulacrum Dec 10, 2024

camshaft Dec 10, 2024

camshaft Dec 11, 2024

Mark-Simulacrum Dec 10, 2024

camshaft Dec 10, 2024

Mark-Simulacrum Dec 10, 2024

camshaft Dec 10, 2024

camshaft Dec 11, 2024

Mark-Simulacrum Dec 11, 2024

camshaft Dec 11, 2024

camshaft commented Dec 11, 2024

	self.working.retain(\|&idx\| {
	let worker = &mut self.workers[idx];
	let Poll::Ready(res) = worker.poll(cx, worker_cx, now, publisher) else {

	fn invariants_test() {
	check!().with_type::<Vec<Op>>().for_each(\|ops\| {
	let mut harness = Harness::default();

	for op in ops {
	match op {
	Op::Insert => {
	harness.insert();
	}
	Op::Wake { idx } => {
	harness.wake(*idx);
	}
	Op::Ready { idx, error } => {
	if *error {
	harness.error(*idx, io::ErrorKind::ConnectionReset);
	} else {
	harness.ready(*idx);
	}
	}
	Op::Advance { millis } => {
	harness.advance(Duration::from_millis(*millis as u64));
	harness.poll();
	}
	}
	}

feat(s2n-quic-dc): only poll accepted streams that are ready #2409

feat(s2n-quic-dc): only poll accepted streams that are ready #2409

Conversation

camshaft commented Dec 7, 2024 • edited Loading

Description of changes:

Call-outs:

Testing:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camshaft commented Dec 11, 2024

camshaft commented Dec 7, 2024 •

edited

Loading