Support for custom character length alphabets #59

anatoly-kussul · 2024-11-30T12:49:04Z

resolve Support for custom character length alphabets #39
(compatible with python's shortuuid custom len alphabets)
resolve panic on more than 1 byte symbols #60
Improved performance:

goos: linux
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i5-12400F
                         │    v4.1.0    │                new                  │
                         │    sec/op    │   sec/op     vs base                │
UUID-12                     412.4n ± 0%   326.5n ± 0%  -20.84% (p=0.000 n=10)
Encoding-12                136.00n ± 1%   43.73n ± 0%  -67.85% (p=0.000 n=10)
Decoding-12                 270.3n ± 0%   180.0n ± 1%  -33.42% (p=0.000 n=10)
NewWithAlphabet-12         8064.0n ± 0%   623.8n ± 0%  -92.27% (p=0.000 n=10)
EncodingB57_MB-12                         122.2n ± 5%
EncodingB16-12                            132.5n ± 1%
EncodingB16_MB-12                         358.6n ± 0%
DecodingB16-12                            220.4n ± 1%
DecodingB16_MB-12                         266.8n ± 2%
NewWithAlphabetB16-12                     486.5n ± 0%
NewWithAlphabetB16_MB-12                  837.2n ± 0%

                         │     B/op      │    B/op     vs base                  │
UUID-12                     40.00 ± 0%     40.00 ± 0%        ~ (p=1.000 n=10) ¹
Encoding-12                 24.00 ± 0%     24.00 ± 0%        ~ (p=1.000 n=10) ¹
Decoding-12                 0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
NewWithAlphabet-12         8545.5 ± 0%     280.0 ± 0%  -96.72% (p=0.000 n=10)
EncodingB57_MB-12                          72.00 ± 0%
EncodingB16-12                             64.00 ± 0%
EncodingB16_MB-12                          512.0 ± 0%
DecodingB16-12                             0.000 ± 0%
DecodingB16_MB-12                          0.000 ± 0%
NewWithAlphabetB16-12                      144.0 ± 0%
NewWithAlphabetB16_MB-12                   592.0 ± 0%

                         │   allocs/op   │ allocs/op   vs base                  │
UUID-12                     2.000 ± 0%     2.000 ± 0%        ~ (p=1.000 n=10) ¹
Encoding-12                 1.000 ± 0%     1.000 ± 0%        ~ (p=1.000 n=10) ¹
Decoding-12                 0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
NewWithAlphabet-12         19.000 ± 0%     3.000 ± 0%  -84.21% (p=0.000 n=10)
EncodingB57_MB-12                          2.000 ± 0%
EncodingB16-12                             2.000 ± 0%
EncodingB16_MB-12                          4.000 ± 0%
DecodingB16-12                             0.000 ± 0%
DecodingB16_MB-12                          0.000 ± 0%
NewWithAlphabetB16-12                      4.000 ± 0%
NewWithAlphabetB16_MB-12                   6.000 ± 0%

anatoly-kussul · 2024-11-30T19:57:51Z

Changed alphabet.Index() to use binary search, which gains ~10% performance on Decode:

                   │     for     │                binary               │
                   │   sec/op    │   sec/op     vs base                │
Decoding-12          203.4n ± 1%   180.2n ± 1%  -11.45% (p=0.000 n=10)

Previous attempt in #51 didn't find any success, because it used sort.Search which adds lambda calls overhead.


                   │     for     │             sort.Search             │
                   │   sec/op    │   sec/op     vs base                │
Decoding-12          203.4n ± 1%   313.1n ± 1%  +53.92% (p=0.000 n=10)

anatoly-kussul · 2024-12-01T09:38:36Z

Added some more benchmarks,
Added optimization for alphabets containing only single-byte characters.

lithammer · 2024-12-01T10:47:05Z

Had to copy some internal Sort's code into project, as before go1.18 sort.Slice uses reflect, which increases allocations and decreases performance

I think we can just raise the minimum version in go.mod to 1.18 instead. 1.13 has been EOL for over 4 years. And 1.18 for almost 2 years.

shortuuid/go.mod

Line 5 in d3d4af7

go 1.13

anatoly-kussul · 2024-12-01T11:24:11Z

seems like I was a bit mistaken, and slices package was an external experimental package at go 1.18, so will have to introduce it as external dependency.
Or if it's ok, we can increase minimum version up to 1.21, slices are already built-in at that point. (1.21 EoL was 3 months ago).

And no clue why CI lint is failing, have to investigate

anatoly-kussul · 2024-12-01T14:29:07Z

Introduced more encode optimizations, using similar technique as used in big.Int.Text().
Updated benchmark in PR description.

lithammer · 2024-12-01T16:44:45Z

Or if it's ok, we can increase minimum version up to 1.21, slices are already built-in at that point. (1.21 EoL was 3 months ago).

Increasing to 1.21 is fine 🙂

README.md

encoder.go

lithammer · 2024-12-01T21:52:40Z

encoder.go

+		if err != nil {
+			return
+		}
+		if e.alphabet.len == defaultBase { // compiler optimization using constant for default base


Same here, is it worth it?

on Decode it's around ~11% performance difference.

I just ran benchmarks on Decode again, and it seems like there was some fluke in the first run.
And it seems like removing this if actually improves performance by ~2%.
Removed it.

…old instead

anatoly-kussul · 2024-12-02T05:03:51Z

Added benchmarks for NewWithNamespace.
Increase preformance of it by removing the need to lower full string before comparison.

goos: linux
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i5-12400F
                         │     old      │                 new                 │
                         │    sec/op    │   sec/op     vs base                │
NewWithNamespace-12        285.4n ±  1%   248.7n ± 2%  -12.89% (p=0.000 n=10)
NewWithNamespaceHttp-12    281.4n ±  1%   253.6n ± 1%   -9.88% (p=0.000 n=10)
NewWithNamespaceHttps-12   322.8n ± 20%   253.6n ± 2%  -21.41% (p=0.000 n=10)
geomean                    296.0n         252.0n       -14.87%

                         │    B/op    │    B/op     vs base                 │
NewWithNamespace-12        200.0 ± 0%   200.0 ± 0%       ~ (p=1.000 n=10)
NewWithNamespaceHttp-12    208.0 ± 0%   208.0 ± 0%       ~ (p=1.000 n=10)
NewWithNamespaceHttps-12   224.0 ± 0%   224.0 ± 0%       ~ (p=1.000 n=10)
geomean                    210.4        210.4       +0.00%

                         │ allocs/op  │ allocs/op   vs base                 │
NewWithNamespace-12        5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10)
NewWithNamespaceHttp-12    5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10)
NewWithNamespaceHttps-12   5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10)
geomean                    5.000        5.000       +0.00%

lithammer · 2024-12-02T07:44:04Z

Thanks! Very impressive work again! 👏

lithammer · 2024-12-02T07:45:59Z

Are you planning on doing more of these or...?

anatoly-kussul · 2024-12-02T07:51:08Z

Thanks! Very impressive work again! 👏

Thanks again for very quick review and approval!

Are you planning on doing more of these or...?

I think I did all I could already. =)

I just noticed that this library (we used v3) was major part of our cpu usage on production.
So I tried to improve performance of it a bit.
We only use New and NewWithNamespace, so that was my main focus in first PR.
But then I noticed, that I also can improve NewWithAlphabet by a lot, and decided to give it a try.

lithammer · 2024-12-02T08:04:28Z

Well I'm glad you took your time. The improvements are quite significant! Always nice to hear from people using it in production 🙂

New release: https://github.com/lithammer/shortuuid/releases/tag/v4.2.0

anatoly-kussul added 10 commits November 30, 2024 12:48

custom alphabet len support

72ebcd4

performance oprimizations

4f9b38a

decode compiler optimization

b249897

update readme

1a820df

update readme

4cf2023

remove unused functions

c8b7419

fix lint

48d7cb7

fix go.sum

b9fe07f

fix some copypaste mistakes

d1d4e2c

index binary search

edb99f4

anatoly-kussul added 6 commits December 1, 2024 08:10

micro optimization

627eb42

add more benchmarks

97ad5e7

add more benchmarks

cf3a6c4

shorten benchmark names

62d53d4

optimization for only single-byte runes alphabets

50fabdc

add test for multiple byte runes in b57 alphabet

a56bd54

micro-optimization

bd1d871

anatoly-kussul added 2 commits December 1, 2024 12:15

increase required go ver to 1.18, use slices package

70d78bd

bump go version in CI

907ee39

anatoly-kussul added 2 commits December 1, 2024 12:32

bump golangci-lint version

4842ddb

encode optimizations

3ce921e

anatoly-kussul added 3 commits December 1, 2024 17:50

increase go version to 1.21 to use built-in slice package

31b85e1

bump golangci-lint:

cf697af

revert alphabet.Index to binary search instead of map

27a53ad

add comment to defaultDivisor

4ab987c

lithammer self-assigned this Dec 1, 2024

lithammer reviewed Dec 1, 2024

View reviewed changes

anatoly-kussul added 2 commits December 2, 2024 05:13

simlify Decode

fc49380

changed strings.HasPrefix(strings.ToLower(name) to use strings.EqualF…

545761e

…old instead

anatoly-kussul added 2 commits December 2, 2024 06:33

ci: add verbose

d9097d2

ci lint: increase timeout, add verbose

730191d

anatoly-kussul requested a review from lithammer December 2, 2024 07:11

lithammer approved these changes Dec 2, 2024

View reviewed changes

lithammer merged commit 9c03ed0 into lithammer:master Dec 2, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for custom character length alphabets #59

Support for custom character length alphabets #59

anatoly-kussul commented Nov 30, 2024 •

edited

Loading

anatoly-kussul commented Nov 30, 2024

anatoly-kussul commented Dec 1, 2024

lithammer commented Dec 1, 2024

anatoly-kussul commented Dec 1, 2024

anatoly-kussul commented Dec 1, 2024

lithammer commented Dec 1, 2024

lithammer Dec 1, 2024

anatoly-kussul Dec 2, 2024 •

edited

Loading

anatoly-kussul Dec 2, 2024

anatoly-kussul commented Dec 2, 2024

lithammer commented Dec 2, 2024

lithammer commented Dec 2, 2024

anatoly-kussul commented Dec 2, 2024

lithammer commented Dec 2, 2024

Support for custom character length alphabets #59

Support for custom character length alphabets #59

Conversation

anatoly-kussul commented Nov 30, 2024 • edited Loading

anatoly-kussul commented Nov 30, 2024

anatoly-kussul commented Dec 1, 2024

lithammer commented Dec 1, 2024

anatoly-kussul commented Dec 1, 2024

anatoly-kussul commented Dec 1, 2024

lithammer commented Dec 1, 2024

lithammer Dec 1, 2024

Choose a reason for hiding this comment

anatoly-kussul Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

anatoly-kussul Dec 2, 2024

Choose a reason for hiding this comment

anatoly-kussul commented Dec 2, 2024

lithammer commented Dec 2, 2024

lithammer commented Dec 2, 2024

anatoly-kussul commented Dec 2, 2024

lithammer commented Dec 2, 2024

anatoly-kussul commented Nov 30, 2024 •

edited

Loading

anatoly-kussul Dec 2, 2024 •

edited

Loading