Skip to content

Conversation

@Arqu
Copy link
Collaborator

@Arqu Arqu commented Dec 10, 2025

Description

Details bellow as PR comments.

Depends on n0-computer/quinn#239

The quinn PR is not in an ideal state and needs more input.

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@Arqu Arqu requested review from dignifiedquire and flub December 10, 2025 20:17
@Arqu Arqu self-assigned this Dec 10, 2025
@Arqu Arqu added this to iroh Dec 10, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Dec 10, 2025
@Arqu Arqu mentioned this pull request Dec 10, 2025
11 tasks
@github-actions
Copy link

github-actions bot commented Dec 10, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3762/docs/iroh/

Last updated: 2025-12-10T21:59:01Z

@github-actions
Copy link

github-actions bot commented Dec 10, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 877eac3

// We sent the last message, so wait for the client to close the connection once
// it received this message.
let res = tokio::time::timeout(Duration::from_secs(3), async move {
let res = tokio::time::timeout(Duration::from_secs(4), async move {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are needed because our default connection close time is a hair more than 3s (default PTO 333ms, with max calculating to 1.0x seconds and close time is 3x that)

// Set closing flag BEFORE wait_idle() to prevent new net_report runs
// from creating QAD connections while we're draining existing ones.
self.msock.closing.store(true, Ordering::Relaxed);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have in flight conns if we close later which prevents closing at all and makes wait idle choke.

}
}
Poll::Pending
if found_transport {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netsim continued to choke on this for the --relay-only tests because apparently when the other transport is disabled it gets stuck here pending forever.

@Arqu Arqu moved this from 🏗 In progress to 👀 In review in iroh Dec 10, 2025
@Arqu
Copy link
Collaborator Author

Arqu commented Dec 10, 2025

Welp, less aggressive path filtering for PTO calcs seems to be not great.

@Arqu Arqu changed the title fix: fix shutdown block fix: shutdown block Dec 10, 2025
Comment on lines +542 to +544
Poll::Ready(Err(io::Error::other(format!(
"no transport available for {dst:?}"
))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a permanent error? IIRC in the past we've always decided to return Poll::Ready(Ok(())) and let the the QUIC stack use loss detection to figure out this is lost.

The quinn-udp philosophy is that only permanent errors should be errors. That is: this socket will never ever be usable again and won't ever be able to send a datagram and needs to be re-bound to get any progress.

And I think that this is not the case here: if you have one transport you can still send on that transport. So I think this should probably not be an error but rather a Poll::Ready(Ok(())).

It would still be helpful to log this though. But if we naively log this we'd end up with flooding the log. IIRC quinn-udp itself has a similar situation where they add a bit of state to the socket so that they only emit that log once every 60 seconds or something like that. Could we do the same? I'd vote debug-level logging for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a permanent error?

yes, transports are not configurable at runtime (for now) so if we have no transport available that works can error out

buut, I don't know why we would ever hit this in our current tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree. You could be sending a datagram for a disabled transport which would result in an error. But later you could send a datagram to an enabled transport, and that should still succeed. So it's a transient error I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transports can't be enabled or disabled currently, this could be the case in the future, but right now that is not possible

src: Option<IpAddr>,
transmit: &Transmit<'_>,
) -> Poll<io::Result<()>> {
let mut found_transport = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to nitpick I'd make this let mut fallback_ret = Poll::Ready(Ok(())); and let each transport set this to pending when they hit a pending. But there are many ways to do this and it's subjective and maybe even a nicer way exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

4 participants