Skip to content

Conversation

@clabby
Copy link
Collaborator

@clabby clabby commented Dec 15, 2025

Overview

Adjusts the interface of runtime::Sink to accept a Buf, allowing implementations to use vectorized write primitives (e.g. tokio's write_vectorized, or iouring's writev op.)

closes #784

@clabby clabby self-assigned this Dec 15, 2025
@clabby clabby added the breaking-api This PR modifies the public interface of a function. label Dec 15, 2025
///
/// Implementations restrict the maximum number of buffers that can be
/// written at once to `16`.
fn send(&mut self, bufs: &[&[u8]]) -> impl Future<Output = Result<(), Error>> + Send;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support StableBuf for iouring safety

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unless we were doing something very dumb?)

Copy link
Collaborator Author

@clabby clabby Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to keep the buffer alive until the op finishes; still drafting the initial changes, PR is up to run tests on Linux.

Copy link
Collaborator Author

@clabby clabby Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now (to retain cancel safety,) I've just gone ahead and allocated a stable buffer underneath the Sink::send impl for io_uring.

I'm not 100% sure if I love this, but the alternative would be to have Sink::sink's signature be:

/// Interface that any runtime must implement to send
/// messages over a network connection.
pub trait Sink: Sync + Send + 'static {
    /// Send messages to the sink using vectored I/O.
    ///
    /// All buffers are written in order as if they were concatenated
    /// into a single contiguous message. The implementation guarantees
    /// that either all bytes are written or an error is returned.
    ///
    /// Implementations restrict the maximum number of buffers that can be
    /// written at once to `16`.
    fn send(&mut self, bufs: Vec<StableBuf>) -> impl Future<Output = Result<(), Error>> + Send;
}

which forces taking ownership over the buffers, even though that's only required for io_uring (given that tokio's write_vectored function operates using the synchronous writev/pwritev syscalls.)

I'll think about this a bit. The current changes yield a big benefit for the base tokio network, but we still force a copy of the whole sent buffer(s) w/ io_uring.

We could potentially go all the way up the stack and require Sender::send + send_frame to take ownership over the data:

monorepo/stream/src/lib.rs

Lines 288 to 307 in 2be92d6

/// Sends encrypted messages to a peer.
pub struct Sender<O> {
cipher: SendCipher,
sink: O,
max_message_size: usize,
}
impl<O: Sink> Sender<O> {
/// Encrypts and sends a message to the peer.
pub async fn send(&mut self, msg: &[u8]) -> Result<(), Error> {
let c = self.cipher.send(msg)?;
send_frame(
&mut self.sink,
&c,
self.max_message_size + CIPHERTEXT_OVERHEAD,
)
.await?;
Ok(())
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I think taking StableBuf all the way at the top works (if it means we avoid all these copies)!

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 15, 2025

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 871532f
Status: ✅  Deploy successful!
Preview URL: https://a21a4425.monorepo-eu0.pages.dev
Branch Preview URL: https://cl-vectorized-sink.monorepo-eu0.pages.dev

View logs

@clabby clabby force-pushed the cl/vectorized-sink branch 2 times, most recently from 0f04a34 to ec4d212 Compare December 15, 2025 19:03
@clabby clabby mentioned this pull request Dec 15, 2025
@clabby clabby force-pushed the cl/vectorized-sink branch 3 times, most recently from bbee56a to f168651 Compare December 16, 2025 01:37
@clabby clabby force-pushed the cl/vectorized-sink branch from f168651 to f47a043 Compare December 16, 2025 01:42
@clabby clabby force-pushed the cl/vectorized-sink branch from d5830bf to 871532f Compare December 16, 2025 02:00
Comment on lines 215 to +228
/// The buffer used for the operation, if any.
/// E.g. For read, this is the buffer being read into.
/// If None, the operation doesn't use a buffer (e.g. a sync operation).
/// We hold the buffer here so it's guaranteed to live until the operation
/// completes, preventing write-after-free issues.
pub buffer: Option<StableBuf>,
/// A boxed `Buf` to keep alive for the duration of the operation and return
/// to the caller. This is useful for vectored I/O where the original buffer
/// must remain valid and the caller needs it back to call `advance()`.
pub buf: Option<Box<dyn Buf + Send>>,
/// Additional data to keep alive for the duration of the operation but not
/// returned to the caller. Useful for things like iovec arrays that the kernel
/// references during the operation.
pub keepalive: Option<Box<dyn Send>>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept the original StableBuf in here given that storage still uses it; If we proceeded, we'd likely want to do a full swap of StableBuf -> Bytes/Buf throughout the storage and network primitives.

prefixed_buf.extend_from_slice(buf);
sink.send(prefixed_buf).await.map_err(Error::SendFailed)
let len_bytes = len.to_be_bytes();
sink.send(Bytes::from_owner(len_bytes).chain(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is the benefit of non-contiguous bytes here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Buf operates over chunks, so you can have a reader that spans non-contiguous memory. Really useful in cases like here where we want to not re-allocate a large slab. There's a few other areas of the codebase that this pattern would be nice to use in as well.

In general, anywhere you would use Vec<u8> / Bytes as an input argument, make it impl Buf, and always return Bytes > Vec<u8>.

@clabby
Copy link
Collaborator Author

clabby commented Dec 18, 2025

#2558

@clabby clabby closed this Dec 18, 2025
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 98.52941% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.49%. Comparing base (b111a6a) to head (871532f).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
p2p/src/simulated/network.rs 66.66% 1 Missing ⚠️
@@            Coverage Diff             @@
##             main    #2518      +/-   ##
==========================================
- Coverage   92.50%   92.49%   -0.01%     
==========================================
  Files         340      345       +5     
  Lines       97412    98319     +907     
==========================================
+ Hits        90110    90944     +834     
- Misses       7302     7375      +73     
Files with missing lines Coverage Δ
runtime/src/lib.rs 97.23% <100.00%> (ø)
runtime/src/mocks.rs 99.38% <100.00%> (+<0.01%) ⬆️
runtime/src/network/audited.rs 96.68% <100.00%> (ø)
runtime/src/network/deterministic.rs 98.52% <100.00%> (ø)
runtime/src/network/metered.rs 99.23% <100.00%> (+0.01%) ⬆️
runtime/src/network/mod.rs 100.00% <100.00%> (ø)
runtime/src/network/tokio.rs 85.15% <100.00%> (ø)
stream/src/lib.rs 95.83% <100.00%> (ø)
stream/src/utils/codec.rs 100.00% <100.00%> (ø)
p2p/src/simulated/network.rs 95.48% <66.66%> (-0.10%) ⬇️

... and 85 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b111a6a...871532f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-api This PR modifies the public interface of a function.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[runtime/stream] Support the use of OwnedWriteHalf::write_vectored

3 participants