fix: Avoid spurious PMTUD resets #2293

larseggert · 2024-12-19T12:41:13Z

Previously, after PMTUD had completed, we could
end up restarting PMTUD when packet loss counters
for packets larger than the current PMTU exceeded
the limit. We're now making sure to not do that.

Previously, after PMTUD had completed, we could end up restarting PMTUD when packet loss counters for packets larger than the current PMTU exceeded the limit. We're now making sure to not do that.

codecov · 2024-12-19T12:57:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.34%. Comparing base (bb45c74) to head (376f804).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2293   +/-   ##
=======================================
  Coverage   93.33%   93.34%           
=======================================
  Files         114      114           
  Lines       36896    36943   +47     
  Branches    36896    36943   +47     
=======================================
+ Hits        34438    34484   +46     
- Misses       1675     1676    +1     
  Partials      783      783

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-12-19T13:06:44Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to bb45c74.

neqo-latest as client

neqo-latest as server

lsquic vs. neqo-latest: run cancelled after 20 min
msquic vs. neqo-latest: Z U
mvfst vs. neqo-latest: Z A L1 C1
quic-go vs. neqo-latest: ⚠️M
quinn vs. neqo-latest: 🚀C1 V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 🚀C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R 🚀Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A 🚀L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L1 L2 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 🚀C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 ⚠️C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 🚀M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A ⚠️L1 L2 ⚠️C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 ⚠️C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

martinthomson · 2024-12-23T00:13:27Z

neqo-transport/src/path.rs

+                    "Outbound interface {name} for destination {} has MTU {mtu}",
+                    remote.ip()


Suggested change

"Outbound interface {name} for destination {} has MTU {mtu}",

remote.ip()

"Outbound interface {name} for destination {ip} has MTU {mtu}",

ip = remote.ip()

martinthomson · 2024-12-23T00:57:14Z

neqo-transport/src/pmtud.rs

+            Probe::Needed | Probe::Sent => {
+                // We saw multiple losses of packets > the current MTU during PMTU discovery, so
+                // we're done.
+                if largest_ok_mtu > self.mtu {


You mentioned that the reason for this change was that we were resetting PMTUD too often, but this will stop a probe when it looks like we should be setting it higher.

I'm not following this. The logic here is that we've detected losses, but they are on larger packets than we are currently sending. That is, we are probing and have seem to have three packet sizes, decreasing in size:

A large MTU, where we have too many losses.

One step lower in size, where we have fewer than MAX_PROBES of losses.

The current MTU.

However, the current MTU is going to be the same as this second value. That is, when we are probing, we'll have self.mtu == largest_ok_mtu when the probed size. The probed size is only ever one step larger than the current MTU. That means that this test will always fail.

Is this extra condition is even needed? If we ever get MAX_PROBES losses, I can think of the following cases:

Those losses correspond to probes (at one step up from largest_ok_mtu) in which case you want to fail the probe and settle at the current MTU.

Those losses are just losses. In which case you might want to drop the MTU down, but maybe not.

You probed at a larger MTU in the past and these packets are just now being marked as lost.

I can see a case for a >= here, perhaps. I don't understand the guard otherwise. As it is, it looks like we'll never stop probing due to losses.

martinthomson · 2024-12-23T00:59:16Z

neqo-transport/src/pmtud.rs

-        // Two packets of size 4000 were lost, which should increase loss counts >= 4000 by two.
-        let expected_lc = search_table_inc(&pmtud, &expected_lc, 4000);
-        let expected_lc = search_table_inc(&pmtud, &expected_lc, 4000);
+        // Two packets of size 4000 were lost, which should increase loss counts >= 4000 by one.


Suggested change

// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by one.

// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by two.

martinthomson · 2024-12-23T01:03:49Z

neqo-transport/src/pmtud.rs

-        // by one. There have now been MAX_PROBES losses of packets >= 8191, so the PMTUD process
-        // should have restarted.
+        // by one. There have now been MAX_PROBES losses of packets >= 8191, but that is larger than
+        // the current MTU, so nothing will happen.
        pmtud.on_packets_lost(&[make_sentpacket(0, now, 9000)], &mut stats, now);


Isn't this pure contrivance? We don't probe in this odd pattern.

fix: Avoid suprious PMTUD resets

376f804

Previously, after PMTUD had completed, we could end up restarting PMTUD when packet loss counters for packets larger than the current PMTU exceeded the limit. We're now making sure to not do that.

larseggert requested review from KershawChang, martinthomson and mxinden as code owners December 19, 2024 12:41

WIP

685a895

larseggert changed the title ~~fix: Avoid suprious PMTUD resets~~ fix: Avoid spurious PMTUD resets Dec 19, 2024

Nits

8a37331

martinthomson reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Avoid spurious PMTUD resets #2293

fix: Avoid spurious PMTUD resets #2293

larseggert commented Dec 19, 2024

codecov bot commented Dec 19, 2024

github-actions bot commented Dec 19, 2024 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

martinthomson Dec 23, 2024

martinthomson Dec 23, 2024

martinthomson Dec 23, 2024

martinthomson Dec 23, 2024

		"Outbound interface {name} for destination {} has MTU {mtu}",
		remote.ip()

	// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by one.
	// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by two.

fix: Avoid spurious PMTUD resets #2293

Are you sure you want to change the base?

fix: Avoid spurious PMTUD resets #2293

Conversation

larseggert commented Dec 19, 2024

codecov bot commented Dec 19, 2024

Codecov Report

github-actions bot commented Dec 19, 2024 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

martinthomson Dec 23, 2024

Choose a reason for hiding this comment

martinthomson Dec 23, 2024

Choose a reason for hiding this comment

martinthomson Dec 23, 2024

Choose a reason for hiding this comment

martinthomson Dec 23, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 19, 2024 •

edited

Loading