-
Notifications
You must be signed in to change notification settings - Fork 11
Add parser for LEB128 integers. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if shift >= Self.bitWidth { | ||
// Additional bytes must be zero (or sign extension for signed) | ||
if Self.isSigned { | ||
let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path will also work for unsigned types, so you don't to branch on isSigned
. I would also set expectedByte
to 0x7f : 0x00
and avoid needing to mask in the next line (there's no performance difference, but I find it clearer).
Plus, you already masked bits
, so use that? guard bits == expected else { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes much clearer. Fixed.
let extraBits = bits & ~allowedMask | ||
if extraBits != 0 { | ||
let isValidSignExtension = | ||
Self.isSigned && extraBits == (~allowedMask & 0x7F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be indented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// Check for overflow before shifting | ||
if shift >= Self.bitWidth { | ||
// Additional bytes must be zero (or sign extension for signed) | ||
let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think expectedByte
has to be 0x7f : 0x00
here, because we compare it to bits
which is already masked, right? Let's make sure that we have a test case that covers this path too (an overwide sign-extended negative value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right. There is another problem here too - I realise this will allow arbitrarily long padding sequences.
Just for context, this part is just to allow padding after the main value. I don't know why an encoder would do this, but the webassembly spec [1] says that [0xFE, 0xFF, 0x7F]
is a valid encoding of -2 whereas a usual encoding would be [0xFE]
So that will make a good test case.
When looking for over-wide enodings I also found an issue in the overflow checks. Int64(Int32.min) - 1
should fail to decode into an Int32 but it incorrectly decodes to a positive number.
Maybe this should be checked for all bit-widths:
@Test(arguments: [Int64(Int32.min) - 1, Int64(Int32.max) + 1])
func overflow(_ i: Int64) throws {
let lebEncoded: [UInt8] = .init(encodingLEB128: i)
#expect(throws: ParsingError.self) {
try lebEncoded.withParserSpan { try Int32(parsingLEB128: &$0) }
}
}
[1] https://webassembly.github.io/spec/core/binary/values.html#integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added two tests that cover the code path discussed: validPaddingLEB128
and tooManyPaddingBytesLEB128
.
The second test also checks the parser constrains byte consumption to ceil(bitWidth / 7)
as per spec.
A check has been added to catch signed integers that have overflowed by comparing the final byte sign to the constructed integer sign.
This is tested in overflowLEB128
/// Some LEB128 encoders output padding bytes which are considered | ||
/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be a regular comment:
/// Some LEB128 encoders output padding bytes which are considered | |
/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)` | |
// Some LEB128 encoders output padding bytes which are considered | |
// valid if the number of bytes does not exceed `ceil(bitWidth / 7)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Thanks so much for this, @willtemperley! 👏🏻👏🏻👏🏻 |
Yay! |
Description
As discussed in #26 this pull request adds support for parsing base128 encoded little-endian integers.
Detailed Design
An initializer has been added to
FixedWidthInteger
which is able to parse signed and unsigned LEB128 encoded integers.Documentation Plan
No documentation added yet. Options for an example parser for e.g. WASM or DWARF were considered but this would widen the scope of this PR significantly.
Test Plan
Parsing is tested in
fuzzMultiByteInteger
,fuzzSingleByteInteger
andfuzzPlatformWidthInteger
tests. An LEB128 encoder has been added toTestSupport
to generate encoded integers in these tests.Source Impact
No existing APIs are affected by this.
Checklist