Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream_capacity can return InvalidStreamState for existing but incomplete streams closed by the peer #1874

Open
ruuda opened this issue Dec 1, 2024 · 1 comment

Comments

@ruuda
Copy link

ruuda commented Dec 1, 2024

Summary

I have a client application that creates many streams, and sends small amounts of data with an immediate fin on them. Sometimes it gets stuck, and stream_capacity returns InvalidStreamState for a stream that was previously written to. This issue looks similar to #1695, perhaps they have the same underlying cause.

Background

My application creates streams in a loop and sends a small amount of data on them (128 bytes), with an immediate fin. It always completes writing on one stream before starting a new stream.

The application is structured as a loop as follows:

  1. If is_established(), while peer_streams_left_uni() allows (and while there is data), call stream_send.
  2. Call conn.send in a loop until Error::Done, sends those packets.
  3. Waits for new received packets or a deadline.
  4. If packets were received, call recv() for them.

There are a few more things going on, but I believe these are the relevant parts. If in step 1 a write is partial (it wrote fewer bytes than provided, or returned Err::Done), then we stop stream_sending for that interation, and resume that write as the first thing in the next iteration.

Details

Almost always this works fine, but sometimes it would get stuck in a situation where stream_send returns Err::Done forever. This happens nondeterministically. Sometimes it happens in the first few milliseconds, sometimes my application runs fine for minutes until it hits this case, but eventually it gets stuck.

To debug this further, I added calls to stream_capacity before any stream_send that is resuming a previous partial write. Because we already wrote more than zero bytes to this stream, it exists. Because we never complete the full write, the stream is not completed. My server application does sometimes call stream_shutdown(Shutdown::Read), and every time my sender got stuck, the server did call stream_shutdown on the stream id that the sender got stuck on.

From the sender side, it makes the following calls:

  1. stream_send, which returns a positive number of bytes written, but less than the size requested.
  2. Sending packages, step 2–4 in the loop above. No new packets were received.
  3. At this point, stream_capacity returns Ok(0). We call stream_send, this time it returns Error::Done.
  4. Send/recv again, we received 46 bytes. (This number is not always the same.)
  5. stream_capacity still returns Ok(0), stream_send still returns Error::Done.
  6. Send/recv again, we received 59 bytes and then 42.
  7. Now stream_capacity returns InvalidStreamState. stream_send returns Error::Done. Here my application gets stuck forever trying to complete the same write.

The details vary from run to run, but there are always two packets received before stream_capacity returns InvalidStreamState, and they are always 50-something and 40-something bytes (but not always the exact same size). The stream id at which it happens also varies, but it doesn’t seem completely random, in particular it commonly happens at stream id 3682. (This might just be due to the way my application writes, that that is the point where it goes over some limit?)

If I understand the documentation correctly, if I ever wrote to the stream and got an Ok(n>0), but didn’t finish, then stream_capacity might return StreamStopped, but it should not return InvalidStreamState, right?

@LPardue
Copy link
Contributor

LPardue commented Dec 2, 2024

Thanks for the report. Are you able to provide logs? It would help to see what frames the server is actually sending. You could provide quiche trace logging (emabled in the example apps via RUST_LOG=trace) or qlogs (enabled in the example apps via the QLOGDIR environment variable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants