Optimize {write,take}Leb128 #25794

NicoElbers · 2025-11-03T11:45:45Z

Old (Base impl) vs this (New impl)

Benchmarking methodology

Performance based on a micro-benchmark compiled with zig nightly build 0.16.0-dev.1234+74900e938 on ReleaseFast. Specifically zig build-exe bench.zig --name name -O ReleaseFast --zig-lib-dir lib.

The new impl was compiled with the standard library of da48ade while the old impl was compiled with the standard library of 416bf1d.

The 'small' category is values of at most 7 bits, the 'medium' category is values of at most 12 bits and the 'full' category is values of @bitSizeOf(T) bits.

For both datasets the benchmark was run after a system reboot and 2 full warmup runs.

Rewrite writeLeb128 to no longer use writeMultipleOf7Leb128, instead:

Make use of byte aligned ints
Special case small numbers (fitting inside 7 bits)

Rewrite Reader.takeLeb128 to not use takeMultipleOf7Leb128, instead:

Use byte aligned integers
Turn the main reading loop into an inlined loop of static length
Special case small integers (<= 7 bits)

Writing performance:

Amongst u8, u16, u32 and u64 performance gains are between ~1.5x and ~2x
Amongst i8, i16, i32 and i64 performance gains are between ~2x and >4x
For integers with bit multiples of 7 performance is roughly equal within the
margin or error or slightly faster.

Reading performance:

signed and unsigned 32 bit integers have 5x to 12x(!)
performance improvement.
For u8, u16 and u64 performance increases ~1.5x to ~6x
For i8, i16 and i64 performance increases ~1.5x to ~3.5x
For integers with bit multiples of 7 performance is roughly equal within the
margin or error or slightly faster.

Rewrite `writeLeb128` to no longer use `writeMultipleOf7Leb128` and instead: * Make use of byte aligned ints * Special case small numbers (fitting inside 7 bits) Amongst u8, u16, u32 and u64 performance gains are between ~1.5x and ~2x Amongst i8, i16, i32 ane i64 perfromance gains are between ~2x and >4x Additinally add test coverage for written encodings Microbenchmark: https://zigbin.io/7ed5fe

Rewrite `Reader.takeLeb128` to not use `takeMultipleOf7Leb128` and instead: * Use byte aligned integers * Turn the main reading loop into an inlined loop of static length * Special case small integers (<= 7 bits) Notably signed and unsigned 32 bit integers have 5x to 12x(!) performance improvement. Outside of that: For u8, u16 and u64 performance increases ~1.5x to ~6x For i8, i16 and i64 performance increases ~1.5x to ~3.5x For integers with bit multiples of 7 performance is roughly equal within the margin or error. Also expand on test coverage Microbenchmark: https://zigbin.io/7ed5fe

NicoElbers force-pushed the leb-perf branch 2 times, most recently from 8234af8 to 309fd8d Compare November 5, 2025 22:16

NicoElbers added 2 commits November 6, 2025 12:48

NicoElbers force-pushed the leb-perf branch from 309fd8d to da48ade Compare November 6, 2025 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize {write,take}Leb128 #25794

Optimize {write,take}Leb128 #25794

NicoElbers commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Optimize {write,take}Leb128 #25794

Are you sure you want to change the base?

Optimize {write,take}Leb128 #25794

Conversation

NicoElbers commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NicoElbers commented Nov 3, 2025 •

edited

Loading