Skip to content

Conversation

@NicoElbers
Copy link
Contributor

@NicoElbers NicoElbers commented Nov 3, 2025

performance_comparison Old (Base impl) vs this (New impl)
Benchmarking methodology

Performance based on a micro-benchmark compiled with zig nightly build 0.16.0-dev.1234+74900e938 on ReleaseFast. Specifically zig build-exe bench.zig --name name -O ReleaseFast --zig-lib-dir lib.

The new impl was compiled with the standard library of da48ade while the old impl was compiled with the standard library of 416bf1d.

The 'small' category is values of at most 7 bits, the 'medium' category is values of at most 12 bits and the 'full' category is values of @bitSizeOf(T) bits.

For both datasets the benchmark was run after a system reboot and 2 full warmup runs.

Rewrite writeLeb128 to no longer use writeMultipleOf7Leb128, instead:

  • Make use of byte aligned ints
  • Special case small numbers (fitting inside 7 bits)

Rewrite Reader.takeLeb128 to not use takeMultipleOf7Leb128, instead:

  • Use byte aligned integers
  • Turn the main reading loop into an inlined loop of static length
  • Special case small integers (<= 7 bits)

Writing performance:

  • Amongst u8, u16, u32 and u64 performance gains are between ~1.5x and ~2x
  • Amongst i8, i16, i32 and i64 performance gains are between ~2x and >4x
  • For integers with bit multiples of 7 performance is roughly equal within the
    margin or error or slightly faster.

Reading performance:

  • signed and unsigned 32 bit integers have 5x to 12x(!)
    performance improvement.
  • For u8, u16 and u64 performance increases ~1.5x to ~6x
  • For i8, i16 and i64 performance increases ~1.5x to ~3.5x
  • For integers with bit multiples of 7 performance is roughly equal within the
    margin or error or slightly faster.

@NicoElbers NicoElbers force-pushed the leb-perf branch 2 times, most recently from 8234af8 to 309fd8d Compare November 5, 2025 22:16
Rewrite `writeLeb128` to no longer use `writeMultipleOf7Leb128` and instead:
 * Make use of byte aligned ints
 * Special case small numbers (fitting inside 7 bits)

Amongst u8, u16, u32 and u64 performance gains are between ~1.5x and ~2x
Amongst i8, i16, i32 ane i64 perfromance gains are between ~2x and >4x

Additinally add test coverage for written encodings

Microbenchmark: https://zigbin.io/7ed5fe
Rewrite `Reader.takeLeb128` to not use `takeMultipleOf7Leb128` and
instead:
 * Use byte aligned integers
 * Turn the main reading loop into an inlined loop of static length
 * Special case small integers (<= 7 bits)

Notably signed and unsigned 32 bit integers have 5x to 12x(!)
performance improvement.

Outside of that:
For u8, u16 and u64 performance increases ~1.5x to ~6x
For i8, i16 and i64 performance increases ~1.5x to ~3.5x

For integers with bit multiples of 7 performance is roughly equal within the
margin or error.

Also expand on test coverage

Microbenchmark: https://zigbin.io/7ed5fe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant