-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions for reading and writing length-prefixed data with customizable encodings for the length #2867
Add functions for reading and writing length-prefixed data with customizable encodings for the length #2867
Conversation
…misable encodings for the length
b181d12
to
74ce004
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat! I've not reviewed it all but left some initial feedback and questions to make sure I understood how it all fits together.
/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible | ||
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so | ||
/// That way, you waste the reserved bytes, but improve writing performance | ||
func writeIntegerWithReservedSpace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API 'stutters' a little bit: could we use reservedCapacity
to overload writeInteger
? i.e.
func writeInteger(_ value: Int, to buffer: inout ByteBuffer, reservedCapacity: Int) -> Int
to buffer: inout ByteBuffer | ||
) -> Int | ||
|
||
/// When writing an integer using this strategy, how many bytes we should reserve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this and writeIntegerWithReservedSpace
quite unintuitive without looking at how they were used. It became clearer when I started reading "Integer" as "Length". Is that the right way to think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i agree this is confusing
The idea here is that reservedSpaceForInteger
is the amount of space that someone should reserve if theyre intending to write an integer using your strategy
And the writeIntegerWithReservedSpace
is for when someone has already reserved some space, and now wants you to write an integer into that space. You can then make the decision on whether you want to try and use exactly that much space (this is what the quic impl does) even if you could've used less. Ie you choose whether you optimize for the smallest possible encoding of the int, or for trying to avoid moving bytes around
This is in practice currently only used by the writePrefixed* functions so yes, there the integer is a length. But i don't think it has to be a length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be used to write an integer, but in the context of this protocol should it be used for anything other than the length?
IMO we should make the intent of the requirement clear, if you wanted to write an integer then you should just call the writeInteger(_:to:)
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments. The bulk of this looks great but still unsure around some of the naming.
/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible. | ||
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so. | ||
/// That way, you waste the reserved bytes, but improve writing performance. | ||
func writeIntegerWithReservedCapacity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: repeating "reserved" in the name and in the parameter name. Is that necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
George's note here is good, we can probably remove the note in the name.
#elseif canImport(Bionic) | ||
import Bionic | ||
#else | ||
#error("The Byte Buffer module was unable to identify your C library.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxDesiatov don't we need WASIlibc here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need this, then it's missing across all of NIO
//===----------------------------------------------------------------------===// | ||
|
||
#if os(Windows) | ||
import ucrt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamzahrmalik why do we need the libcs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the memmove function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in this case we should remove the imports here.
- we shouldn't use
*Unsafe*
here, the most minimal subset ofByteBuffer-core.swift
needs to provide enough memmove
isn't necessary to be called directly.UnsafeRawBufferPointer
can move memory with a higher-level API (I think it'sbufferPointer[0..<100] = bufferPointer[2..<102]
or something like that; but again, no Unsafe in here please)
#elseif canImport(Musl) | ||
import Musl | ||
#elseif canImport(Bionic) | ||
import Bionic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import Bionic | |
import Bionic | |
#elseif canImport(WASILibc) | |
import WASILibc |
// Add the required number of bytes to the end first | ||
self.writeRepeatingByte(0, count: requiredSpace) | ||
// Move the data forward by that many bytes, to make space at the front | ||
self.withVeryUnsafeMutableBytes { pointer in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we avoid *Unsafe*
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really: this is the safe wrapper around the access.
Ultimately this is implementing memmove
on ByteBuffer
. Implementing it safely is somewhere between very hard and impossible using the APIs we currently have. We can run through the options:
- Bytewise copy using
writeInteger
. This is going to be very costly for large moves, which is the concern we have in this method. - Manually-autovectorised copy using
writeMultipleInteger
and wider integers. This can work, but it's still going to be very slow compared tomemmove
and it will be much more complex code as we have to handle a wide range of edge-cases. - Move using
writeImmutableBuffer
. This will, unfortunately, trigger a CoW. - Move using
ByteBufferView
. This will also trigger a CoW.
I fundamentally can't see an implementation here that performs well enough to be used on larger buffers, which is also the circumstance in which it is most likely that we'll need a move. But I'm open to other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, if this is necessary functionality, can we add it to ByteBuffer-core.swift
and use in other places too? If my memory serves me right, then @glbrntt added some memory-moving API already a few years back, might be wrong about that though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close enough, I added copyBytes(at:to:length:)
5 years ago as part of making ByteBufferView
mutable. Good memory!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamzahrmalik isn't that exactly what we need? memmove
== copying bytes too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks like we could use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this works perfectly
/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible. | ||
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so. | ||
/// That way, you waste the reserved bytes, but improve writing performance. | ||
func writeIntegerWithReservedCapacity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
George's note here is good, we can probably remove the note in the name.
to buffer: inout ByteBuffer | ||
) -> Int | ||
|
||
/// An estimate of the bytes required to write integers using ths strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// An estimate of the bytes required to write integers using ths strategy. | |
/// An estimate of the bytes required to write integers using this strategy. |
var requiredBytesHint: Int { get } | ||
|
||
/// Write an integer to a buffer. Move the writer index to after the written integer. | ||
/// Call this function if you have already reserved some capacity for an integer to be written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tone of this sentence seems wrong: it's an instruction to the caller, but the caller is us.
|
||
/// Write an integer to a buffer. Move the writer index to after the written integer. | ||
/// Call this function if you have already reserved some capacity for an integer to be written. | ||
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. | |
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. |
/// Call this function if you have already reserved some capacity for an integer to be written. | ||
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. | ||
/// Otherwise, callers may need to shift bytes to reconcile the difference. | ||
/// It is up to the implementor to find the balance between performance and size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// It is up to the implementor to find the balance between performance and size. | |
/// It is up to the implementer to find the balance between performance and size. |
/// Read a binary encoded integer, moving the `readerIndex` appropriately. | ||
/// If there are not enough bytes, nil is returned. | ||
@inlinable | ||
public mutating func readEncodedInteger<Strategy: NIOBinaryIntegerEncodingStrategy>(_ strategy: Strategy) -> Int? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be generic over FixedWidthInteger
, the way the existing readInteger
and writeInteger
functions are.
//===----------------------------------------------------------------------===// | ||
|
||
extension ByteBuffer { | ||
public struct QUICBinaryEncodingStrategy: NIOBinaryIntegerEncodingStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have a doc comment here.
|
||
/// - Parameter requiredBytesHint: An estimate of the bytes required to write integers using this strategy | ||
@inlinable | ||
public init(requiredBytesHint: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think it's useful to make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases, a user might have a rough idea of how many bytes they're expecting to write. In such cases, allowing them to use a smaller number would result in saved bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be, make this not configurable, but then add a parameter to ByteBuffer/writeLengthPrefixed(strategy:writeData:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaning towards the alternative
Having the requiredBytesHint parameter here is quite confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks really good. Most of my comments are doc nits but a couple are minor API tweaks that I think we should fix.
/// Describes a way to encode and decode an integer as bytes | ||
/// For more information, see <doc:ByteBuffer-lengthPrefix> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this once sentence or two? We need some punctuation between the two otherwise it'll be rendered as "... bytes For more information, ..."
|
||
/// Write an integer to a buffer. Move the writer index to after the written integer. | ||
/// - Parameters: | ||
/// - integer: The type of the integer to write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - integer: The type of the integer to write. | |
/// - integer: The integer to write. |
|
||
/// Write an integer to a buffer. Move the writer index to after the written integer. | ||
/// This function will be called when an integer needs to be written, and some capacity has already been reserved for it. | ||
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity. | |
/// Implementers should consider using a less efficient encoding, if possible, to fit exactly within the reserved capacity. |
integer encodings, in which smaller numbers can be encoded in fewer bytes. The first 2 bits of the first byte indicate | ||
how many further bytes should be read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the QUIC example is great but don't think we need the detail about how the QUIC strategy works.
integer encodings, in which smaller numbers can be encoded in fewer bytes. The first 2 bits of the first byte indicate | |
how many further bytes should be read. | |
integer encodings, in which smaller numbers can be encoded in fewer bytes. |
|
||
- Encode the length of the integer into the integer itself when writing, so it knows how many bytes to read when | ||
reading. This is what QUIC does. | ||
- Always use the same length, e.g. a simple strategy which always writes the integer as a UInt64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Always use the same length, e.g. a simple strategy which always writes the integer as a UInt64. | |
- Always use the same length, e.g. a simple strategy which always writes the integer as a `UInt64`. |
- If the length ends up needing more bytes than we had reserved, shuffle the data forward to make space | ||
|
||
This code will be most performant when the `requiredBytesHint` is exactly correct, because it will avoid needing to | ||
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the writeInteger function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the writeInteger function | |
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the `writeInteger` function |
) -> Int | ||
``` | ||
|
||
Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For e.g., in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "e.g." is the abbreviation of "exempli gratia" which means "for the sake of example", so no need to write "For e.g.", Just use "e.g." or "for example"
Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For e.g., in | |
Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For example, in |
public mutating func readEncodedInteger<Strategy: NIOBinaryIntegerEncodingStrategy, Integer: FixedWidthInteger>( | ||
_ strategy: Strategy, | ||
as: Integer.Type = Integer.self | ||
) -> Integer? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readEncodedInteger(.quic)
is a bit odd when using type inference (and is different to writeEncodedInteger(42, strategy: .quic)
).
I think this should be spelled as readEncodedInteger(as:strategy:)
which works well with or without type inference and mirrors the write API nicely.
} | ||
|
||
/// Prefixes bytes written by `writeData` with the number of bytes written. | ||
/// The number of bytes written is encoded using `strategy` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The number of bytes written is encoded using `strategy` | |
/// The number of bytes written is encoded using `strategy`. |
/// If there are not enough bytes to read the full slice, the readerIndex will stay unchanged. | ||
@inlinable | ||
public mutating func readLengthPrefixedSlice<Strategy: NIOBinaryIntegerEncodingStrategy>( | ||
_ strategy: Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about dropping the label on the strategy
here, I think we should keep the strategy label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo one doc nit!
@@ -129,10 +131,14 @@ extension ByteBuffer { | |||
|
|||
extension NIOBinaryIntegerEncodingStrategy where Self == ByteBuffer.QUICBinaryEncodingStrategy { | |||
@inlinable | |||
/// Encodes bytes as defined in RFC 9000 § 16 | |||
/// - Parameter requiredBytesHint: An estimate of the bytes required to write integers using this strategy | |||
/// - Returns: <#description#> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one needs filling out, <#description#>
} | ||
|
||
@usableFromInline | ||
func bytesNeededForInteger<IntegerType: FixedWidthInteger>(_ integer: IntegerType) -> Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think exposing this logic publicly would be useful. Consider a case where I want to write multiple variable length integers to a buffer - how can I appropriately size the buffer's capacity ahead of time? I can assume they'll all need 8 bytes, but then I'm going to over allocate.
var buffer = allocator.buffer(capacity: enoughRoomForTwoVarints)
buffer.writeInteger(myInt, strategy: .quic)
buffer.writeInteger(myOtherInt, strategy: .quic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, let's get this landed.
var buffer = ByteBuffer() | ||
let strategy = ByteBuffer.QUICBinaryEncodingStrategy.quic | ||
let bytesWritten = strategy.writeInteger( | ||
0b00000010_00011001_01111100_01011110_11111111_00010100_11101000_10001100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamzahrmalik, this line and a couple more writes below broke compilation of these tests for 32-bit platforms like Android armv7 on my Android CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@finagolfin @Lukasa @FranzBusch can we integrate the Android & 32 bit CIs into NIO's own github actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Franz and I discussed getting Android tested on this CI, I'm aiming to have something for that in the coming months. However, that just builds the source for armv7 but wouldn't run the tests.
If you have a 32-bit CI like watchOS or embedded that you could add quickly, that may be a better first step to get 32-bit platforms tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised a PR to fix this: #2904
… systems (#2904) This is a followup to #2867 Some of the literals used in the test cases were too big to fit in an Int32 This is fine on 64-bit systems, because the literals are considered as `Int`, which is Int64 on those systems However, on 32-bit systems, those literals are considered as Int64 Change: Add `as Int64` where needed, to tell the compiler we want these literals to be treated as Int64, which should allow these tests to run on 32 bit systems too
Add functions for reading and writing length-prefixed data with customizable encodings for the length, particularly for quic variable-length integers (RFC 9000)
Motivation:
Many protocols require us to write data and then prefix that data with its length. But each protocol has a different way of encoding the length. This PR introduces general purpose functions which can be extended for different encoding strategies
Modifications:
Create a new protocol which defines how to encode an integer
Implement this protocol for QUIC
Provide functions on bytebuffer for writing length-prefixed buffers, strings or bytes