Optimize char deserialization with manual UTF-8 decoder by tanmay4l · Pull Request #33 · anza-xyz/wincode

tanmay4l · 2025-11-17T10:20:36Z

addresses the TODO comment at line 247-250 which noted: "Could implement a manual decoder that avoids UTF-8
validate + chars() and instead performs the UTF-8 validity checks and produces a char directly. Some quick
micro-benchmarking revealed a roughly 2x speedup is possible."

Changes

Before:

let str = core::str::from_utf8(buf).map_err(invalid_utf8_encoding)?;
let c = str.chars().next().unwrap();

After:
- Manual UTF-8 decoding for 2-4 byte characters using bit masks
- Inline validation of continuation bytes (must be 10xxxxxx)
- Overlong encoding validation (3-byte: >= U+0800, 4-byte: >= U+10000)
- Surrogate validation (rejects U+D800..U+DFFF)
- Out of range validation (rejects > U+10FFFF)

kskalski · 2026-02-16T00:09:53Z

You mention using a microbenchmark? Does it make sense to include it in the PR (e.g. add to wincode/benches) and get the comparison numbers in the PR description?

kskalski · 2026-02-19T01:37:11Z

Thanks, I used the code and did a few other changes on top of it including the benchmark too in #187

tanmay4l added 2 commits November 17, 2025 15:49

Optimize char deserialization with manual UTF-8 decoder

e6e80ac

Clippy-clean

3a2c574

tanmay4l closed this Jan 22, 2026

tanmay4l deleted the optimize-char-decode branch January 22, 2026 18:58

tanmay4l restored the optimize-char-decode branch January 22, 2026 19:25

tanmay4l reopened this Jan 22, 2026

Merge branch 'master' into optimize-char-decode

f6cccf4

kskalski mentioned this pull request Feb 18, 2026

perf: manual decode for utf-8 char deserialization using take_array #187

Merged

kskalski closed this Feb 19, 2026

tanmay4l deleted the optimize-char-decode branch February 22, 2026 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize char deserialization with manual UTF-8 decoder#33

Optimize char deserialization with manual UTF-8 decoder#33
tanmay4l wants to merge 3 commits intoanza-xyz:masterfrom
tanmay4l:optimize-char-decode

tanmay4l commented Nov 17, 2025

Uh oh!

kskalski commented Feb 16, 2026

Uh oh!

kskalski commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tanmay4l commented Nov 17, 2025

Changes

Uh oh!

kskalski commented Feb 16, 2026

Uh oh!

kskalski commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants