Add parser for LEB128 integers. #31

willtemperley · 2025-08-17T09:33:27Z

Description

As discussed in #26 this pull request adds support for parsing base128 encoded little-endian integers.

Detailed Design

An initializer has been added to FixedWidthInteger which is able to parse signed and unsigned LEB128 encoded integers.

public init(parsingLEB128 input: inout ParserSpan) throws(ParsingError) {
  ....
}

Documentation Plan

No documentation added yet. Options for an example parser for e.g. WASM or DWARF were considered but this would widen the scope of this PR significantly.

Test Plan

Parsing is tested in fuzzMultiByteInteger, fuzzSingleByteInteger and fuzzPlatformWidthInteger tests. An LEB128 encoder has been added to TestSupport to generate encoded integers in these tests.

Source Impact

No existing APIs are affected by this.

Checklist

[x ] I've added at least one test that validates that my change is working, if appropriate
[x ] I've followed the code style of the rest of the project
[x ] I've read the Contribution Guidelines
[x ] I've updated the documentation if necessary

stephentyrone · 2025-08-18T16:12:00Z

Sources/BinaryParsing/Parsers/Integer.swift

+      if shift >= Self.bitWidth {
+        // Additional bytes must be zero (or sign extension for signed)
+        if Self.isSigned {
+          let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00


This path will also work for unsigned types, so you don't to branch on isSigned. I would also set expectedByte to 0x7f : 0x00 and avoid needing to mask in the next line (there's no performance difference, but I find it clearer).

Plus, you already masked bits, so use that? guard bits == expected else { ... }

Yes much clearer. Fixed.

stephentyrone · 2025-08-18T16:25:17Z

Sources/BinaryParsing/Parsers/Integer.swift

+          let extraBits = bits & ~allowedMask
+          if extraBits != 0 {
+            let isValidSignExtension =
+            Self.isSigned && extraBits == (~allowedMask & 0x7F)


This line should be indented

stephentyrone · 2025-08-19T13:52:46Z

Sources/BinaryParsing/Parsers/Integer.swift

+      // Check for overflow before shifting
+      if shift >= Self.bitWidth {
+        // Additional bytes must be zero (or sign extension for signed)
+        let expectedByte: UInt8 = (result < 0) ? 0xFF : 0x00


I think expectedByte has to be 0x7f : 0x00 here, because we compare it to bits which is already masked, right? Let's make sure that we have a test case that covers this path too (an overwide sign-extended negative value)

Yes you're right. There is another problem here too - I realise this will allow arbitrarily long padding sequences.

Just for context, this part is just to allow padding after the main value. I don't know why an encoder would do this, but the webassembly spec [1] says that [0xFE, 0xFF, 0x7F] is a valid encoding of -2 whereas a usual encoding would be [0xFE] So that will make a good test case.

When looking for over-wide enodings I also found an issue in the overflow checks. Int64(Int32.min) - 1 should fail to decode into an Int32 but it incorrectly decodes to a positive number.

Maybe this should be checked for all bit-widths:

@Test(arguments: [Int64(Int32.min) - 1, Int64(Int32.max) + 1]) func overflow(_ i: Int64) throws { let lebEncoded: [UInt8] = .init(encodingLEB128: i) #expect(throws: ParsingError.self) { try lebEncoded.withParserSpan { try Int32(parsingLEB128: &$0) } } }

[1] https://webassembly.github.io/spec/core/binary/values.html#integers

I've added two tests that cover the code path discussed: validPaddingLEB128 and tooManyPaddingBytesLEB128.
The second test also checks the parser constrains byte consumption to ceil(bitWidth / 7) as per spec.

A check has been added to catch signed integers that have overflowed by comparing the final byte sign to the constructed integer sign.
This is tested in overflowLEB128

…yte count check.

natecook1000 · 2025-08-21T16:26:02Z

Tests/BinaryParsingTests/IntegerParsingTests.swift

+  /// Some LEB128 encoders output padding bytes which are considered
+  /// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`


This can just be a regular comment:

Suggested change

/// Some LEB128 encoders output padding bytes which are considered

/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`

// Some LEB128 encoders output padding bytes which are considered

// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`

Tests/BinaryParsingTests/IntegerParsingTests.swift

natecook1000 · 2025-09-04T17:41:40Z

Thanks so much for this, @willtemperley! 👏🏻👏🏻👏🏻

stephentyrone · 2025-09-04T18:46:08Z

Yay!

Add parser for LEB128 integers.

20a91f7

stephentyrone reviewed Aug 18, 2025

View reviewed changes

Improve LEB128 input validation. Fix indentation.

7b38b85

stephentyrone requested changes Aug 19, 2025

View reviewed changes

Add tests for LEB128 overflow. Fixed overflow check and added a max b…

5a8ec06

…yte count check.

willtemperley mentioned this pull request Aug 21, 2025

Feature Request: LEB128 integers #26

Open

natecook1000 reviewed Aug 21, 2025

View reviewed changes

Fix formatting.

0e78c61

natecook1000 reviewed Sep 3, 2025

View reviewed changes

Tests/BinaryParsingTests/IntegerParsingTests.swift Show resolved Hide resolved

Fix whitespace issues.

1b034f0

natecook1000 merged commit e9915c7 into apple:main Sep 4, 2025
16 checks passed

		/// Some LEB128 encoders output padding bytes which are considered
		/// valid if the number of bytes does not exceed `ceil(bitWidth / 7)`

Add parser for LEB128 integers. #31

Add parser for LEB128 integers. #31

Uh oh!

Conversation

willtemperley commented Aug 17, 2025

Description

Detailed Design

Documentation Plan

Test Plan

Source Impact

Checklist

Uh oh!

stephentyrone Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

natecook1000 commented Sep 4, 2025

Uh oh!

Uh oh!

stephentyrone commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stephentyrone Aug 18, 2025 •

edited

Loading