Skip to content

Conversation

willtemperley
Copy link
Contributor

Description

As discussed in #26 this pull request adds support for parsing base128 encoded little-endian integers.

Detailed Design

An initializer has been added to FixedWidthInteger which is able to parse signed and unsigned LEB128 encoded integers.

public init(parsingLEB128 input: inout ParserSpan) throws(ParsingError) {
  ....
}

Documentation Plan

No documentation added yet. Options for an example parser for e.g. WASM or DWARF were considered but this would widen the scope of this PR significantly.

Test Plan

Parsing is tested in fuzzMultiByteInteger, fuzzSingleByteInteger and fuzzPlatformWidthInteger tests. An LEB128 encoder has been added to TestSupport to generate encoded integers in these tests.

Source Impact

No existing APIs are affected by this.

Checklist

  • [x ] I've added at least one test that validates that my change is working, if appropriate
  • [x ] I've followed the code style of the rest of the project
  • [x ] I've read the Contribution Guidelines
  • [x ] I've updated the documentation if necessary

if shift >= Self.bitWidth {
// Additional bytes must be zero (or sign extension for signed)
if Self.isSigned {
let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00
Copy link
Member

@stephentyrone stephentyrone Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path will also work for unsigned types, so you don't to branch on isSigned. I would also set expectedByte to 0x7f : 0x00 and avoid needing to mask in the next line (there's no performance difference, but I find it clearer).

Plus, you already masked bits, so use that? guard bits == expected else { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes much clearer. Fixed.

let extraBits = bits & ~allowedMask
if extraBits != 0 {
let isValidSignExtension =
Self.isSigned && extraBits == (~allowedMask & 0x7F)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be indented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Check for overflow before shifting
if shift >= Self.bitWidth {
// Additional bytes must be zero (or sign extension for signed)
let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expectedByte has to be 0x7f : 0x00 here, because we compare it to bits which is already masked, right? Let's make sure that we have a test case that covers this path too (an overwide sign-extended negative value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right. There is another problem here too - I realise this will allow arbitrarily long padding sequences.

Just for context, this part is just to allow padding after the main value. I don't know why an encoder would do this, but the webassembly spec [1] says that [0xFE, 0xFF, 0x7F] is a valid encoding of -2 whereas a usual encoding would be [0xFE] So that will make a good test case.

When looking for over-wide enodings I also found an issue in the overflow checks. Int64(Int32.min) - 1 should fail to decode into an Int32 but it incorrectly decodes to a positive number.

Maybe this should be checked for all bit-widths:

  @Test(arguments: [Int64(Int32.min) - 1, Int64(Int32.max) + 1])
  func overflow(_ i: Int64) throws {
    let lebEncoded: [UInt8] = .init(encodingLEB128: i)
    #expect(throws: ParsingError.self) {
      try lebEncoded.withParserSpan { try Int32(parsingLEB128: &$0) }
    }
  }

[1] https://webassembly.github.io/spec/core/binary/values.html#integers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added two tests that cover the code path discussed: validPaddingLEB128 and tooManyPaddingBytesLEB128.
The second test also checks the parser constrains byte consumption to ceil(bitWidth / 7) as per spec.

A check has been added to catch signed integers that have overflowed by comparing the final byte sign to the constructed integer sign.
This is tested in overflowLEB128

Comment on lines 643 to 644
/// Some LEB128 encoders output padding bytes which are considered
/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be a regular comment:

Suggested change
/// Some LEB128 encoders output padding bytes which are considered
/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`
// Some LEB128 encoders output padding bytes which are considered
// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@natecook1000
Copy link
Member

Thanks so much for this, @willtemperley! 👏🏻👏🏻👏🏻

@natecook1000 natecook1000 merged commit e9915c7 into apple:main Sep 4, 2025
16 checks passed
@stephentyrone
Copy link
Member

Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants