Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid spurious PMTUD resets #2293

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

Previously, after PMTUD had completed, we could
end up restarting PMTUD when packet loss counters
for packets larger than the current PMTU exceeded
the limit. We're now making sure to not do that.

Previously, after PMTUD had completed, we could
end up restarting PMTUD when packet loss counters
for packets larger than the current PMTU exceeded
the limit. We're now making sure to not do that.
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.34%. Comparing base (bb45c74) to head (376f804).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2293   +/-   ##
=======================================
  Coverage   93.33%   93.34%           
=======================================
  Files         114      114           
  Lines       36896    36943   +47     
  Branches    36896    36943   +47     
=======================================
+ Hits        34438    34484   +46     
- Misses       1675     1676    +1     
  Partials      783      783           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Dec 19, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to bb45c74.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert larseggert changed the title fix: Avoid suprious PMTUD resets fix: Avoid spurious PMTUD resets Dec 19, 2024
Comment on lines +542 to +543
"Outbound interface {name} for destination {} has MTU {mtu}",
remote.ip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Outbound interface {name} for destination {} has MTU {mtu}",
remote.ip()
"Outbound interface {name} for destination {ip} has MTU {mtu}",
ip = remote.ip()

Probe::Needed | Probe::Sent => {
// We saw multiple losses of packets > the current MTU during PMTU discovery, so
// we're done.
if largest_ok_mtu > self.mtu {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that the reason for this change was that we were resetting PMTUD too often, but this will stop a probe when it looks like we should be setting it higher.

I'm not following this. The logic here is that we've detected losses, but they are on larger packets than we are currently sending. That is, we are probing and have seem to have three packet sizes, decreasing in size:

  1. A large MTU, where we have too many losses.
  2. One step lower in size, where we have fewer than MAX_PROBES of losses.
  3. The current MTU.

However, the current MTU is going to be the same as this second value. That is, when we are probing, we'll have self.mtu == largest_ok_mtu when the probed size. The probed size is only ever one step larger than the current MTU. That means that this test will always fail.

Is this extra condition is even needed? If we ever get MAX_PROBES losses, I can think of the following cases:

  1. Those losses correspond to probes (at one step up from largest_ok_mtu) in which case you want to fail the probe and settle at the current MTU.
  2. Those losses are just losses. In which case you might want to drop the MTU down, but maybe not.
  3. You probed at a larger MTU in the past and these packets are just now being marked as lost.

I can see a case for a >= here, perhaps. I don't understand the guard otherwise. As it is, it looks like we'll never stop probing due to losses.

// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by two.
let expected_lc = search_table_inc(&pmtud, &expected_lc, 4000);
let expected_lc = search_table_inc(&pmtud, &expected_lc, 4000);
// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by one.
// Two packets of size 4000 were lost, which should increase loss counts >= 4000 by two.

// by one. There have now been MAX_PROBES losses of packets >= 8191, so the PMTUD process
// should have restarted.
// by one. There have now been MAX_PROBES losses of packets >= 8191, but that is larger than
// the current MTU, so nothing will happen.
pmtud.on_packets_lost(&[make_sentpacket(0, now, 9000)], &mut stats, now);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pure contrivance? We don't probe in this odd pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants