Skip to content

Conversation

@ashvardanian
Copy link
Owner

No description provided.

Benchmarks suggest, for xlsum.csv:
sz_find_newline_utf8_serial: 7.49 GB/s
sz_find_whitespace_utf8_serial: 5.53 GB/s
sz_find_newline_utf8_ice: 17.54 GB/s - 2.3x
sz_find_whitespace_utf8_ice: 6.10 GB/s - 1.1x
sz_find_newline_utf8_serial: 0.9 GB/s
sz_find_whitespace_utf8_serial: 0.7 GB/s
sz_find_newline_utf8_ice: 14.5 GB/s - 16x
sz_find_whitespace_utf8_ice: 1.3 GB/s - 1.85x
Benchmarking `sz_find_newline_utf8_serial`:
> Throughput: 880.96 MiB/s @ 72.65 ms/call

Benchmarking `sz_find_whitespace_utf8_serial`:
> Throughput: 594.65 MiB/s @ 107.63 ms/call

Benchmarking `sz_find_newline_utf8_ice`:
> Throughput: 19.20 GiB/s @ 3.25 ms/call
> + 22.3 x against `sz_find_newline_utf8_serial`

Benchmarking `sz_find_whitespace_utf8_ice`:
> Throughput: 1.32 GiB/s @ 47.44 ms/call
> + 2.3 x against `sz_find_whitespace_utf8_serial`
Benchmarking `sz_utf8_count_serial`:
> Throughput: 2.71 GiB/s @ 23.02 ms/call

Benchmarking `sz_utf8_count_haswell`:
> Throughput: 45.78 GiB/s @ 1.37 ms/call
> + 16.9 x against `sz_utf8_count_serial`

Benchmarking `sz_utf8_count_ice`:
> Throughput: 41.79 GiB/s @ 1.50 ms/call
> + 15.4 x against `sz_utf8_count_serial`
Benchmarking `sz_utf8_find_whitespace_serial`:
> Throughput: 264.52 MiB/s @ 15.48 s/call

Benchmarking `sz_utf8_find_whitespace_neon`:
> Throughput: 4.98 GiB/s @ 803.59 ms/call
> + 19.3 x against `sz_utf8_find_whitespace_serial`

Benchmarking `sz_utf8_find_newline_serial`:
> Throughput: 6.36 GiB/s @ 628.70 ms/call

Benchmarking `sz_utf8_find_newline_neon`:
> Throughput: 10.15 GiB/s @ 394.23 ms/call
> + 59.5 % against `sz_utf8_find_newline_serial`
We now have a separate pack/unaligned state
and a `sz_hash_state_internal_t_` for inner
logic.
PyTest now downloads the Unicode speicifcation
and compares all valid UTF-8 encoded characters
via StringZilla's new `utf8_case_fold` API
Different bindings used different design for skipping
or including empty segments or trailing newlines in
input strings. The new design is consistent across
C++, Python, and Rust.
Performance in SVE2 still looks pretty bad.

Benchmarking `sz_utf8_find_newline_serial`:
> Throughput: 579.69 MiB/s @ 7.07 s/call

Benchmarking `sz_utf8_find_whitespace_serial`:
> Throughput: 495.51 MiB/s @ 8.27 s/call

Benchmarking `sz_utf8_find_newline_neon`:
> Throughput: 37.80 GiB/s @ 105.83 ms/call
> + 66.8 x against `sz_utf8_find_newline_serial`

Benchmarking `sz_utf8_find_whitespace_neon`:
> Throughput: 400.16 MiB/s @ 10.24 s/call
> - 19.2 % against `sz_utf8_find_whitespace_serial`

Benchmarking `sz_utf8_find_newline_sve2`:
> Throughput: 28.04 GiB/s @ 142.67 ms/call
> + 49.5 x against `sz_utf8_find_newline_serial`

Benchmarking `sz_utf8_find_whitespace_sve2`:
> Throughput: 389.96 MiB/s @ 10.50 s/call
> - 21.3 % against `sz_utf8_find_whitespace_serial`
@ashvardanian ashvardanian merged commit 7b3e20d into main Nov 26, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants