Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new versions of choose and choose_stable #1268

Merged
merged 29 commits into from
Jan 5, 2023

Conversation

wainwrightmark
Copy link
Contributor

@wainwrightmark wainwrightmark commented Nov 17, 2022

Related to #1266

Edit: I have made some performance improvements (now 2-3x as fast as the original and updated the benchmarks)

DO NOT MERGE as is. The old functions need to be removed and the new ones renamed. I added a lot of benchmarks to test the new changes, they don't need to be permanent. I think Coin_Flipper could do with a better name too, and better tests.

This adds new, better performing, versions of choose() and choose_stable() which perform about 1.5x to 2x as fast.
This actually uses a different method to the one described in the issue, it now only needs to generate one u32 per 16 iterator items on average. This works much better than the other version on longer lists.

This is a value breaking change.

Benchmark results below.
Summary: basically unchanged for size_hinted and window_hinted. About 1.5-2x improvement for unhinted and stable, thought these are more modest when using faster RNGs and when there are fewer elements.

use rand_chacha::ChaCha20Rng as CryptoRng;
use rand_pcg::Pcg32 as SmallRng;

# choose unhinted - this is where I was expecting performance gains
test seq_iter_unhinted_choose_from_10000_cryptoRng_new_version        ... bench:      21,323 ns/iter (+/- 2,551)
test seq_iter_unhinted_choose_from_10000_cryptoRng_old_version        ... bench:      76,843 ns/iter (+/- 8,739)
test seq_iter_unhinted_choose_from_10000_smallRng_new_version         ... bench:      19,321 ns/iter (+/- 870)
test seq_iter_unhinted_choose_from_10000_smallRng_old_version         ... bench:      44,929 ns/iter (+/- 2,894)
test seq_iter_unhinted_choose_from_1000__cryptoRng_new_version         ... bench:       2,233 ns/iter (+/- 198)
test seq_iter_unhinted_choose_from_1000__cryptoRng_old_version         ... bench:       7,019 ns/iter (+/- 434)
test seq_iter_unhinted_choose_from_1000__smallRng_new_version          ... bench:       2,028 ns/iter (+/- 174)
test seq_iter_unhinted_choose_from_1000__smallRng_old_version          ... bench:       4,028 ns/iter (+/- 512)
test seq_iter_unhinted_choose_from_100___cryptoRng_new_version          ... bench:         281 ns/iter (+/- 18)
test seq_iter_unhinted_choose_from_100___cryptoRng_old_version          ... bench:         792 ns/iter (+/- 68)
test seq_iter_unhinted_choose_from_100___smallRng_new_version           ... bench:         269 ns/iter (+/- 16)
test seq_iter_unhinted_choose_from_100___smallRng_old_version           ... bench:         481 ns/iter (+/- 42)
test seq_iter_unhinted_choose_from_10____cryptoRng_new_version           ... bench:          45 ns/iter (+/- 7)
test seq_iter_unhinted_choose_from_10____cryptoRng_old_version           ... bench:         100 ns/iter (+/- 13)
test seq_iter_unhinted_choose_from_10____smallRng_new_version            ... bench:          45 ns/iter (+/- 6)
test seq_iter_unhinted_choose_from_10____smallRng_old_version            ... bench:          64 ns/iter (+/- 5)

# choose_stable() should be very close to unhinted
test seq_iter_stable_choose_from_10000_cryptoRng_new_version          ... bench:      20,740 ns/iter (+/- 1,149)
test seq_iter_stable_choose_from_10000_cryptoRng_old_version          ... bench:      70,251 ns/iter (+/- 7,609)
test seq_iter_stable_choose_from_10000_smallRng_new_version           ... bench:      19,237 ns/iter (+/- 1,996)
test seq_iter_stable_choose_from_10000_smallRng_old_version           ... bench:      45,256 ns/iter (+/- 5,011)
test seq_iter_stable_choose_from_1000__cryptoRng_new_version           ... bench:       2,175 ns/iter (+/- 132)
test seq_iter_stable_choose_from_1000__cryptoRng_old_version           ... bench:       6,455 ns/iter (+/- 1,272)
test seq_iter_stable_choose_from_1000__smallRng_new_version            ... bench:       2,019 ns/iter (+/- 195)
test seq_iter_stable_choose_from_1000__smallRng_old_version            ... bench:       4,132 ns/iter (+/- 391)
test seq_iter_stable_choose_from_100___cryptoRng_new_version            ... bench:         289 ns/iter (+/- 27)
test seq_iter_stable_choose_from_100___cryptoRng_old_version            ... bench:         735 ns/iter (+/- 54)
test seq_iter_stable_choose_from_100___smallRng_new_version             ... bench:         268 ns/iter (+/- 34)
test seq_iter_stable_choose_from_100___smallRng_old_version             ... bench:         478 ns/iter (+/- 41)
test seq_iter_stable_choose_from_10____cryptoRng_new_version             ... bench:          47 ns/iter (+/- 5)
test seq_iter_stable_choose_from_10____cryptoRng_old_version             ... bench:          92 ns/iter (+/- 6)
test seq_iter_stable_choose_from_10____smallRng_new_version              ... bench:          43 ns/iter (+/- 3)
test seq_iter_stable_choose_from_10____smallRng_old_version              ... bench:          64 ns/iter (+/- 4)

# Window hinted - maybe a slight improvement
test seq_iter_window_hinted_choose_from_10000_cryptoRng_new_version   ... bench:      16,382 ns/iter (+/- 1,672)
test seq_iter_window_hinted_choose_from_10000_cryptoRng_old_version   ... bench:      17,044 ns/iter (+/- 2,429)
test seq_iter_window_hinted_choose_from_10000_smallRng_new_version    ... bench:      10,606 ns/iter (+/- 347)
test seq_iter_window_hinted_choose_from_10000_smallRng_old_version    ... bench:      11,415 ns/iter (+/- 974)
test seq_iter_window_hinted_choose_from_1000__cryptoRng_new_version    ... bench:       1,558 ns/iter (+/- 163)
test seq_iter_window_hinted_choose_from_1000__cryptoRng_old_version    ... bench:       1,674 ns/iter (+/- 190)
test seq_iter_window_hinted_choose_from_1000__smallRng_new_version     ... bench:       1,037 ns/iter (+/- 108)
test seq_iter_window_hinted_choose_from_1000__smallRng_old_version     ... bench:       1,106 ns/iter (+/- 93)
test seq_iter_window_hinted_choose_from_100___cryptoRng_new_version     ... bench:         200 ns/iter (+/- 17)
test seq_iter_window_hinted_choose_from_100___cryptoRng_old_version     ... bench:         214 ns/iter (+/- 13)
test seq_iter_window_hinted_choose_from_100___smallRng_new_version      ... bench:         145 ns/iter (+/- 23)
test seq_iter_window_hinted_choose_from_100___smallRng_old_version      ... bench:         149 ns/iter (+/- 12)
test seq_iter_window_hinted_choose_from_10____cryptoRng_new_version      ... bench:          32 ns/iter (+/- 7)
test seq_iter_window_hinted_choose_from_10____cryptoRng_old_version      ... bench:          34 ns/iter (+/- 3)
test seq_iter_window_hinted_choose_from_10____smallRng_new_version       ... bench:          24 ns/iter (+/- 2)
test seq_iter_window_hinted_choose_from_10____smallRng_old_version       ... bench:          26 ns/iter (+/- 3)


# Size hinted - basically no noticeable change - note that these run 1000 times each 
test seq_iter_size_hinted_choose_from_10000_cryptoRng_new_version     ... bench:       8,389 ns/iter (+/- 422) = 953 MB/s
test seq_iter_size_hinted_choose_from_10000_cryptoRng_old_version     ... bench:       8,402 ns/iter (+/- 661) = 952 MB/s
test seq_iter_size_hinted_choose_from_10000_smallRng_new_version      ... bench:       5,325 ns/iter (+/- 521) = 1502 MB/s
test seq_iter_size_hinted_choose_from_10000_smallRng_old_version      ... bench:       5,401 ns/iter (+/- 211) = 1481 MB/s
test seq_iter_size_hinted_choose_from_1000__cryptoRng_new_version      ... bench:       2,654 ns/iter (+/- 219) = 3014 MB/s
test seq_iter_size_hinted_choose_from_1000__cryptoRng_old_version      ... bench:       2,714 ns/iter (+/- 354) = 2947 MB/s
test seq_iter_size_hinted_choose_from_1000__smallRng_new_version       ... bench:       1,211 ns/iter (+/- 45) = 6606 MB/s
test seq_iter_size_hinted_choose_from_1000__smallRng_old_version       ... bench:       1,188 ns/iter (+/- 43) = 6734 MB/s
test seq_iter_size_hinted_choose_from_100___cryptoRng_new_version       ... bench:       5,330 ns/iter (+/- 636) = 1500 MB/s
test seq_iter_size_hinted_choose_from_100___cryptoRng_old_version       ... bench:       5,268 ns/iter (+/- 561) = 1518 MB/s
test seq_iter_size_hinted_choose_from_100___smallRng_new_version        ... bench:       2,948 ns/iter (+/- 406) = 2713 MB/s
test seq_iter_size_hinted_choose_from_100___smallRng_old_version        ... bench:       2,996 ns/iter (+/- 344) = 2670 MB/s
test seq_iter_size_hinted_choose_from_10____cryptoRng_new_version        ... bench:       8,266 ns/iter (+/- 925) = 967 MB/s
test seq_iter_size_hinted_choose_from_10____cryptoRng_old_version        ... bench:       8,873 ns/iter (+/- 1,017) = 901 MB/s
test seq_iter_size_hinted_choose_from_10____smallRng_new_version         ... bench:       5,246 ns/iter (+/- 482) = 1524 MB/s
test seq_iter_size_hinted_choose_from_10____smallRng_old_version         ... bench:       5,036 ns/iter (+/- 385) = 1588 MB/s

benches/seq.rs Outdated Show resolved Hide resolved
benches/seq.rs Outdated Show resolved Hide resolved
Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice approach, but the implementation needs a bit of polish.

A shame you didn't use criterion for the new benchmarks, but no need to rewrite it now.

I'd also like to see benchmarks for very small lengths like 1/2/3 elements ideally. Some constructs like iter::once and Iterator::chain might produce these. Probably not worth keeping all variants in the benchmark so a quick hack will do.

Comment on lines 22 to 34
// Explanation:
// We are trying to return true with a probability of n / d
// If n >= d, we can just return true
// Otherwise there are two possibilities 2n < d and 2n >= d
// In either case we flip a coin.
// If 2n < d
// If it comes up tails, return false
// If it comes up heads, double n and start again
// This is fair because (0.5 * 0) + (0.5 * 2n / d) = n / d and 2n is less than d (if 2n was greater than d we would effectively round it down to 1 by returning true)
// If 2n >= d
// If it comes up tails, set n to 2n - d
// If it comes up heads, return true
// This is fair because (0.5 * 1) + (0.5 * (2n - d) / d) = n / d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using expected value notation:

Let X(n,d) be an event that is true with probability n/d:
X(n,d) ≡ x < n/d for uniform x drawn from [0, 1)

Let X'(n,d) ≡ x' < n/d for x' drawn from [0, 1) independent of x.

E(X(n,d) | 2n < d)
    = E(x<n/d | x < 1/2) / 2 + E(x<n/d | x ≥ 1/2) / 2
    = E(x' < 2n/d) / 2 + E(false) / 2
    = E(X'(2n, d) / 2 + E(false) / 2

E(X(n,d) | 2n ≥ d)
    = E(x<n/d | x < 1/2) / 2 + E(x<n/d | x ≥ 1/2) / 2
    = E(true) / 2 + E(x' < 2*(n/d - 1/2)) / 2
    = E(true) / 2 + E(x' < 2n/d - 1) / 2
    = E(true) / 2 + E(x' < (2n - d)/d) / 2
    = E(true) / 2 + E(X'(2n-d, d)) / 2

src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
@wainwrightmark
Copy link
Contributor Author

Thanks for having a look at this. I'm afraid I've been a bit sick for the past few days but I will fix the problems and change the benchmarks soon.

@wainwrightmark
Copy link
Contributor Author

I've updated the code to resolve the correctness issue. Unfortunately this has led to a slight performance regression for the case when you are using a window-hinted iterator and a crypto RNG. It is now about 10% slower in those cases (this is not entirely unexpected, in this situation the new code is not really being used but is also not being optimized away like in the size-hinted version). Anyway, I understand if you want to abandon this as it is a noticeable performance regression in that case.

I also added benchmarks for 1,2, and 3 elements. The new method leads to a big performance improvement in these cases but that is largely because the old method is generating a random number to test the first element whereas the new method is skipping that test.

I tried using criterion benchmarks. They give pretty similar results (both the default measurement and with cycles-per-byte) but I couldn't find a way to use criterion for that benchmark without adding criterion it as a dev-dependency for the whole project thereby making all of the testing ci take much longer.

Also I seem to have broken two of the test builds and I'm not entirely sure how to fix them.

These are the new benchmark results, hopefully in a more readable format

Window Hinted

Regression for the CryptoRng, basically unchanged for SmallRng

Number of Elements Rng Old ns/iter New ns/iter Ratio
1 CryptoRng 12 3 4
2 CryptoRng 16 8 2
3 CryptoRng 12 14 0.8571428571
10 CryptoRng 27 30 0.9
100 CryptoRng 169 187 0.9037433155
1,000 CryptoRng 1,296 1,463 0.8858509911
10,000 CryptoRng 13,275 15,170 0.8750823995
1 SmallRng 10 3 3.333333333
2 SmallRng 14 14 1
3 SmallRng 11 11 1
10 SmallRng 23 23 1
100 SmallRng 138 138 1
1,000 SmallRng 1,015 1,014 1.000986193
10,000 SmallRng 10,321 10,460 0.9867112811

Unhinted

About 1.2-3x performance improvement

Number of Elements Rng Old ns/iter New ns/iter Ratio
1 CryptoRng 11 2 5.5
2 CryptoRng 23 10 2.3
3 CryptoRng 29 25 1.16
10 CryptoRng 94 64 1.46875
100 CryptoRng 813 345 2.356521739
1,000 CryptoRng 6,719 2,506 2.681165204
10,000 CryptoRng 73,624 23,360 3.151712329
1 SmallRng 8 1 8
2 SmallRng 16 7 2.285714286
3 SmallRng 19 16 1.1875
10 SmallRng 63 43 1.465116279
100 SmallRng 476 252 1.888888889
1,000 SmallRng 4,174 1,934 2.158221303
10,000 SmallRng 45,099 18,083 2.493999889

Stable

About 1.2-3x performance improvement

Number of Elements Rng Old ns/iter New ns/iter Ratio
1 CryptoRng 11 2 5.5
2 CryptoRng 23 10 2.3
3 CryptoRng 29 23 1.260869565
10 CryptoRng 93 66 1.409090909
100 CryptoRng 746 363 2.055096419
1,000 CryptoRng 6,675 2,506 2.663607342
10,000 CryptoRng 70,872 23,553 3.009043434
1 SmallRng 8 1 8
2 SmallRng 16 7 2.285714286
3 SmallRng 20 16 1.25
10 SmallRng 65 43 1.511627907
100 SmallRng 471 257 1.832684825
1,000 SmallRng 4,020 1,850 2.172972973
10,000 SmallRng 44,547 18,276 2.437458963

Size Hinted

No change. Note that each run is 1000 iterations

Number of Elements Rng Old ns/1000 iter New ns/1000 iter Ratio
1 CryptoRng 10,681 10,767 0.9920126312
2 CryptoRng 10,562 10,483 1.007536011
3 CryptoRng 5,689 5,597 1.016437377
10 CryptoRng 8,104 7,913 1.024137495
100 CryptoRng 5,240 5,354 0.9787075084
1,000 CryptoRng 2,642 2,563 1.030823254
10,000 CryptoRng 8,381 8,303 1.009394195
1 SmallRng 7,218 7,221 0.9995845451
2 SmallRng 7,251 7,310 0.9919288646
3 SmallRng 3,184 3,181 1.0009431
10 SmallRng 4,925 4,923 1.000406256
100 SmallRng 2,841 2,825 1.005663717
1,000 SmallRng 1,198 1,196 1.001672241
10,000 SmallRng 5,300 5,404 0.9807549963

@dhardy
Copy link
Member

dhardy commented Dec 6, 2022

Thanks. Re-reviewing is on my to-do list.

I tried using criterion benchmarks. They give pretty similar results (both the default measurement and with cycles-per-byte) but I couldn't find a way to use criterion for that benchmark without adding criterion it as a dev-dependency for the whole project thereby making all of the testing ci take much longer.

Eventually I think we'd like to migrate all benches to criterion anyway (e.g. see this and this), so no worries about adding it as a dev-dependency. If you already wrote Criterion versions then it'd be nice to have those here.

Don't worry about those CI tests; they should be resolved by #1269.

benches/seq_choose.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Outdated Show resolved Hide resolved
src/seq/mod.rs Show resolved Hide resolved
src/seq/mod.rs Show resolved Hide resolved
src/seq/coin_flipper.rs Show resolved Hide resolved
@dhardy
Copy link
Member

dhardy commented Dec 7, 2022

I ran your benchmarks on my system (5800X) to see if I get similar results... broadly speaking yes (this PR is still a big improvement) but the details are quite different. Most surprising is that quite a few of your results show no significant change, while none of mine do. Also, your highest speedup is 8 where as mine is 4.5.

Your ratio is copied to the last column.

      ns ns speedup WWM
size_hinted 1 cryptoRng 11865 9012 1.32 0.99
size_hinted 2 cryptoRng 11899 9012 1.32 1.01
size_hinted 3 cryptoRng 6958 4774 1.46 1.02
size_hinted 10 cryptoRng 9222 6804 1.36 1.02
size_hinted 100 cryptoRng 6629 4205 1.58 0.98
size_hinted 1000 cryptoRng 4705 2602 1.81 1.03
size_hinted 10000 cryptoRng 9618 6801 1.41 1.01
size_hinted 1 smallRng 8407 6200 1.36 1.00
size_hinted 2 smallRng 8569 6132 1.40 0.99
size_hinted 3 smallRng 4724 2864 1.65 1.00
size_hinted 10 smallRng 6308 4347 1.45 1.00
size_hinted 100 smallRng 4455 2585 1.72 1.01
size_hinted 1000 smallRng 3068 1407 2.18 1.00
size_hinted 10000 smallRng 6567 4562 1.44 0.98
unhinted 1 cryptoRng 12 3 4.00 5.50
unhinted 2 cryptoRng 24 12 2.00 2.30
unhinted 3 cryptoRng 31 25 1.24 1.16
unhinted 10 cryptoRng 104 71 1.46 1.47
unhinted 100 cryptoRng 867 488 1.78 2.36
unhinted 1000 cryptoRng 7629 4092 1.86 2.68
unhinted 10000 cryptoRng 81416 39453 2.06 3.15
unhinted 1 smallRng 9 2 4.50 8.00
unhinted 2 smallRng 17 8 2.13 2.29
unhinted 3 smallRng 22 17 1.29 1.19
unhinted 10 smallRng 71 50 1.42 1.47
unhinted 100 smallRng 573 324 1.77 1.89
unhinted 1000 smallRng 5176 2579 2.01 2.16
unhinted 10000 smallRng 55280 24456 2.26 2.49
unhinted_stable 1 cryptoRng 12 3 4.00 5.50
unhinted_stable 2 cryptoRng 24 12 2.00 2.30
unhinted_stable 3 cryptoRng 31 25 1.24 1.26
unhinted_stable 10 cryptoRng 102 73 1.40 1.41
unhinted_stable 100 cryptoRng 839 505 1.66 2.06
unhinted_stable 1000 cryptoRng 7766 4078 1.90 2.66
unhinted_stable 10000 cryptoRng 81398 41300 1.97 3.01
unhinted_stable 1 smallRng 9 2 4.50 8.00
unhinted_stable 2 smallRng 17 8 2.13 2.29
unhinted_stable 3 smallRng 22 18 1.22 1.25
unhinted_stable 10 smallRng 72 50 1.44 1.51
unhinted_stable 100 smallRng 568 325 1.75 1.83
unhinted_stable 1000 smallRng 5126 2559 2.00 2.17
unhinted_stable 10000 smallRng 55232 24462 2.26 2.44
window_hinted 1 cryptoRng 15 5 3.00 4.00
window_hinted 2 cryptoRng 19 18 1.06 2.00
window_hinted 3 cryptoRng 16 15 1.07 0.86
window_hinted 10 cryptoRng 31 29 1.07 0.90
window_hinted 100 cryptoRng 200 183 1.09 0.90
window_hinted 1000 cryptoRng 1632 1457 1.12 0.89
window_hinted 10000 cryptoRng 16749 14982 1.12 0.88
window_hinted 1 smallRng 11 5 2.20 3.33
window_hinted 2 smallRng 15 14 1.07 1.00
window_hinted 3 smallRng 13 12 1.08 1.00
window_hinted 10 smallRng 25 23 1.09 1.00
window_hinted 100 smallRng 158 132 1.20 1.00
window_hinted 1000 smallRng 1265 994 1.27 1.00
window_hinted 10000 smallRng 12968 10215 1.27 0.99

@wainwrightmark
Copy link
Contributor Author

Slightly confused by your benchmark results. My code changes shouldn't effect the size-hinted iterator at all so it's surprising that you'd see a noticeable improvement.
The unhinted and stable results look very similar to mine (the biggest differences are basically caused by lack of precision - mine went from 8ns to 1ns, yours from 9ns to 2ns so without more decimal places we can't really say that the underlying ratios are different)

@dhardy
Copy link
Member

dhardy commented Dec 11, 2022

Updated benchmarks using your Criterion bench. Baseline is master branch.

Size hinted: 1-2% slower (excluding "from 1", which are too fast for useful timings)

choose_size-hinted_from_1_small
time: [311.88 ps 312.10 ps 312.46 ps]
change: [-96.285% -96.284% -96.282%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe

choose_size-hinted-from_1_crypto
time: [312.02 ps 312.13 ps 312.24 ps]
change: [-97.482% -97.481% -97.480%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_2_small
time: [8.5583 ns 8.5634 ns 8.5704 ns]
change: [+1.9437% +2.0223% +2.1424%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
1 (1.00%) high mild
12 (12.00%) high severe

choose_size-hinted-from_2_crypto
time: [12.262 ns 12.268 ns 12.274 ns]
change: [-1.1314% -1.0855% -1.0368%] (p = 0.00 < 0.05)
Performance has improved.

choose_size-hinted_from_3_small
time: [4.7571 ns 4.7593 ns 4.7618 ns]
change: [+1.0772% +1.1195% +1.1591%] (p = 0.00 < 0.05)
Performance has regressed.

choose_size-hinted-from_3_crypto
time: [7.4407 ns 7.4475 ns 7.4568 ns]
change: [+1.7710% +1.8543% +1.9916%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

choose_size-hinted_from_10_small
time: [6.4278 ns 6.4318 ns 6.4361 ns]
change: [+1.6673% +1.8028% +1.9221%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe

choose_size-hinted-from_10_crypto
time: [9.5898 ns 9.5947 ns 9.6017 ns]
change: [+0.4214% +0.4549% +0.4930%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_100_small
time: [4.4448 ns 4.4473 ns 4.4504 ns]
change: [+0.8706% +0.9226% +0.9879%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
9 (9.00%) high mild
6 (6.00%) high severe

choose_size-hinted-from_100_crypto
time: [7.0240 ns 7.0250 ns 7.0261 ns]
change: [+2.0148% +2.0451% +2.0757%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe

choose_size-hinted_from_1000_small
time: [3.0348 ns 3.0352 ns 3.0357 ns]
change: [+0.1328% +0.1519% +0.1706%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe

choose_size-hinted-from_1000_crypto
time: [5.0719 ns 5.0740 ns 5.0766 ns]
change: [+3.8371% +3.8894% +3.9409%] (p = 0.00 < 0.05)
Performance has regressed.

Stable: good improvements (mostly 30-50% less time)

choose_stable_from_1_small
time: [7.3342 ns 7.4515 ns 7.5584 ns]
change: [-40.092% -39.260% -38.370%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low mild

choose_stable_from_1_crypto
time: [9.3997 ns 9.4773 ns 9.5477 ns]
change: [-41.222% -40.501% -39.803%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low severe
5 (5.00%) low mild

choose_stable_from_2_small
time: [11.073 ns 11.074 ns 11.075 ns]
change: [-41.562% -41.519% -41.478%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

choose_stable_from_2_crypto
time: [12.860 ns 12.862 ns 12.864 ns]
change: [-49.353% -49.334% -49.316%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild

choose_stable_from_3_small
time: [21.732 ns 21.749 ns 21.766 ns]
change: [-6.5032% -6.4167% -6.3435%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe

choose_stable_from_3_crypto
time: [23.599 ns 23.607 ns 23.615 ns]
change: [-25.792% -25.760% -25.720%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

choose_stable_from_10_small
time: [57.568 ns 57.582 ns 57.595 ns]
change: [-15.733% -15.694% -15.651%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_stable_from_10_crypto
time: [60.407 ns 60.418 ns 60.431 ns]
change: [-35.601% -35.586% -35.567%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
8 (8.00%) high mild
4 (4.00%) high severe

choose_stable_from_100_small
time: [350.27 ns 350.29 ns 350.31 ns]
change: [-32.801% -32.776% -32.751%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild

choose_stable_from_100_crypto
time: [363.49 ns 363.62 ns 363.79 ns]
change: [-50.513% -50.413% -50.320%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe

choose_stable_from_1000_small
time: [2.7206 µs 2.7206 µs 2.7208 µs]
change: [-41.110% -41.095% -41.080%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

choose_stable_from_1000_crypto
time: [2.8316 µs 2.8319 µs 2.8322 µs]
change: [-56.532% -56.523% -56.514%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

Unhinted: similarly good improvements

choose_unhinted_from_1_small
time: [3.7256 ns 3.7276 ns 3.7302 ns]
change: [-58.075% -58.034% -57.984%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
4 (4.00%) high mild
15 (15.00%) high severe

choose_unhinted_from_1_crypto
time: [3.9776 ns 3.9877 ns 3.9986 ns]
change: [-69.601% -69.523% -69.453%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_2_small
time: [9.4782 ns 9.4838 ns 9.4899 ns]
change: [-45.437% -45.389% -45.337%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_2_crypto
time: [11.096 ns 11.100 ns 11.108 ns]
change: [-56.732% -56.718% -56.701%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe

choose_unhinted_from_3_small
time: [20.120 ns 20.129 ns 20.139 ns]
change: [-9.9446% -9.8488% -9.7798%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
5 (5.00%) high mild
6 (6.00%) high severe

choose_unhinted_from_3_crypto
time: [21.779 ns 21.782 ns 21.785 ns]
change: [-34.238% -34.206% -34.178%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_10_small
time: [55.653 ns 55.671 ns 55.691 ns]
change: [-23.366% -23.300% -23.237%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_10_crypto
time: [57.475 ns 57.485 ns 57.496 ns]
change: [-45.773% -45.761% -45.749%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

choose_unhinted_from_100_small
time: [349.15 ns 349.24 ns 349.33 ns]
change: [-40.123% -39.957% -39.824%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe

choose_unhinted_from_100_crypto
time: [358.90 ns 358.96 ns 359.01 ns]
change: [-58.644% -58.637% -58.630%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_1000_small
time: [2.7503 µs 2.7513 µs 2.7524 µs]
change: [-47.801% -47.768% -47.740%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe

choose_unhinted_from_1000_crypto
time: [2.8229 µs 2.8237 µs 2.8246 µs]
change: [-64.304% -64.282% -64.255%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe

Window hinted: within -3% to +10% time, aside from 1-elt sizes

choose_windowed_from_1_small
time: [5.8134 ns 5.8740 ns 5.9359 ns]
change: [-50.533% -49.996% -49.422%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
4 (4.00%) low mild
2 (2.00%) high mild

choose_windowed_from_1_crypto
time: [5.2936 ns 5.3261 ns 5.3611 ns]
change: [-65.318% -65.039% -64.802%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

choose_windowed_from_2_small
time: [14.162 ns 14.169 ns 14.178 ns]
change: [+2.3599% +2.5538% +2.7263%] (p = 0.00 < 0.05)
Performance has regressed.

choose_windowed_from_2_crypto
time: [18.216 ns 18.219 ns 18.221 ns]
change: [+4.2184% +4.5029% +4.6979%] (p = 0.00 < 0.05)
Performance has regressed.

choose_windowed_from_3_small
time: [11.642 ns 11.645 ns 11.648 ns]
change: [+2.7918% +2.8577% +2.9191%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
2 (2.00%) low mild
7 (7.00%) high mild

choose_windowed_from_3_crypto
time: [14.947 ns 14.951 ns 14.954 ns]
change: [+6.3821% +6.4527% +6.5353%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe

choose_windowed_from_10_small
time: [22.343 ns 22.355 ns 22.367 ns]
change: [-0.6392% -0.5756% -0.5069%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high mild

choose_windowed_from_10_crypto
time: [29.501 ns 29.513 ns 29.526 ns]
change: [+4.5710% +4.6276% +4.6841%] (p = 0.00 < 0.05)
Performance has regressed.

choose_size-hinted_from_100_small
time: [4.4448 ns 4.4473 ns 4.4504 ns]
change: [+0.8706% +0.9226% +0.9879%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
9 (9.00%) high mild
6 (6.00%) high severe

choose_windowed_from_100_small
time: [133.85 ns 133.89 ns 133.94 ns]
change: [-2.5021% -2.4309% -2.3667%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

choose_windowed_from_100_crypto
time: [187.24 ns 187.30 ns 187.37 ns]
change: [+5.7062% +5.8279% +5.9351%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe

choose_windowed_from_1000_small
time: [1.0200 µs 1.0202 µs 1.0204 µs]
change: [-0.6164% -0.5041% -0.4075%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low severe
1 (1.00%) high mild

choose_windowed_from_1000_crypto
time: [1.5212 µs 1.5218 µs 1.5224 µs]
change: [+10.026% +10.072% +10.117%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild

Overall nice improvements. There are quite a few outliers (I didn't bother limiting CPU boosting) but I think the results are still useful, excluding 1-element sizes. There are some unusual results at the 3-10 element sizes.

@dhardy
Copy link
Member

dhardy commented Dec 11, 2022

The test failures are annoying; at least the 1.56 (MSRV) one is due to adding Criterion. I'll try to fix this with a new PR before merging this.

@dhardy
Copy link
Member

dhardy commented Dec 11, 2022

Done: #1275. Could you rebase or merge please?

src/seq/mod.rs Outdated Show resolved Hide resolved
src/seq/coin_flipper.rs Show resolved Hide resolved
benches/seq_choose.rs Outdated Show resolved Hide resolved
@dhardy
Copy link
Member

dhardy commented Jan 4, 2023

Thanks @wainwrightmark. Comparing PCG64 results to the other RNGs, they mostly look the same as PCG32 results (aside from window-hinted, where they are close to ChaCha results). If the RNG were fully utilised we should see better results (excepting short iterators), but at least these results aren't too bad.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is ready to be merged?

@dhardy
Copy link
Member

dhardy commented Jan 4, 2023

I hacked this to generate results using a 64-bit chunk, but the results aren't what I was expecting. The vast majority of results are exactly the same or marginally worse. With ChaCha20 a few results are around 40% worse (not too surprising I guess; more random bits are generated and discarded; what is surprising is that 1000-long iterators are some of the most affected). Otherwise, Pcg64 is around 14% faster in the window-hinted benches; both Pcg RNGs are around 5% faster for unhinted benches and Pcg32 is 2-4% faster in stable benches (Pcg64 is marginally slower in these).

I suppose this goes to show that Pcg64 is fast enough that discarding half the output each time isn't very important, while ChaCha really is impacted by the extra discarded bytes (but only in some of the benches). Or it's just micro-benchmark weirdness. Either way, we can drop the idea of using 64-bit chunks here unless anyone wants to further investigate.

@dhardy
Copy link
Member

dhardy commented Jan 5, 2023

Benchmark results
Test: choose from RNG ns (32-bit) +/- ns (64-bit) +/- ratio
size-hinted 1 ChaCha20   0 0 0  
size-hinted 2 ChaCha20 11 0 12 0 1.091
size-hinted 3 ChaCha20 7 0 7 0 1.000
size-hinted 10 ChaCha20 9 0 9 0 1.000
size-hinted 100 ChaCha20 6 0 6 0 1.000
size-hinted 1000 ChaCha20 4 0 4 0 1.000
stable 1 ChaCha20 5 0 7 0 1.400
stable 2 ChaCha20 12 0 17 0 1.417
stable 3 ChaCha20 23 0 31 0 1.348
stable 10 ChaCha20 59 1 80 0 1.356
stable 100 ChaCha20 361 1 510 10 1.413
stable 1000 ChaCha20 2839 33 4189 7 1.476
unhinted 1 ChaCha20 4 0 5 0 1.250
unhinted 2 ChaCha20 11 0 15 1 1.364
unhinted 3 ChaCha20 21 0 29 0 1.381
unhinted 10 ChaCha20 57 0 77 0 1.351
unhinted 100 ChaCha20 360 1 487 0 1.353
unhinted 1000 ChaCha20 2831 13 3968 9 1.402
windowed 1 ChaCha20 5 0 6 0 1.200
windowed 2 ChaCha20 17 0 17 0 1.000
windowed 3 ChaCha20 14 0 14 0 1.000
windowed 10 ChaCha20 28 0 28 0 1.000
windowed 100 ChaCha20 178 0 182 0 1.022
windowed 1000 ChaCha20 1418 1 1474 2 1.039
size-hinted 1 Pcg32 0 0 0 0  
size-hinted 2 Pcg32 8 0 8 0 1.000
size-hinted 3 Pcg32 4 0 4 0 1.000
size-hinted 10 Pcg32 6 0 6 0 1.000
size-hinted 100 Pcg32 4 0 4 0 1.000
size-hinted 1000 Pcg32 3 0 3 0 1.000
stable 1 Pcg32 7 0 7 0 1.000
stable 2 Pcg32 11 0 12 0 1.091
stable 3 Pcg32 21 0 23 0 1.095
stable 10 Pcg32 57 0 59 0 1.035
stable 100 Pcg32 348 1 341 0 0.980
stable 1000 Pcg32 2725 12 2612 2 0.959
unhinted 1 Pcg32 3 0 3 0 1.000
unhinted 2 Pcg32 9 0 10 0 1.111
unhinted 3 Pcg32 20 0 20 0 1.000
unhinted 10 Pcg32 55 0 56 2 1.018
unhinted 100 Pcg32 349 1 338 0 0.968
unhinted 1000 Pcg32 2762 20 2599 1 0.941
windowed 1 Pcg32 6 0 6 0 1.000
windowed 2 Pcg32 14 0 14 0 1.000
windowed 3 Pcg32 11 0 11 0 1.000
windowed 10 Pcg32 22 0 22 0 1.000
windowed 100 Pcg32 139 0 139 0 1.000
windowed 1000 Pcg32 1070 1 1056 1 0.987
size-hinted 1 Pcg64 0 0 0 0  
size-hinted 2 Pcg64 9 0 9 0 1.000
size-hinted 3 Pcg64 5 0 5 0 1.000
size-hinted 10 Pcg64 7 0 7 0 1.000
size-hinted 100 Pcg64 5 0 5 0 1.000
size-hinted 1000 Pcg64 3 0 3 0 1.000
stable 1 Pcg64 5 0 6 0 1.200
stable 2 Pcg64 12 0 13 0 1.083
stable 3 Pcg64 23 0 23 0 1.000
stable 10 Pcg64 59 0 60 0 1.017
stable 100 Pcg64 354 0 357 0 1.008
stable 1000 Pcg64 2748 3 2784 4 1.013
unhinted 1 Pcg64 4 0 4 0 1.000
unhinted 2 Pcg64 10 0 10 0 1.000
unhinted 3 Pcg64 21 0 21 0 1.000
unhinted 10 Pcg64 57 0 56 0 0.982
unhinted 100 Pcg64 350 0 337 0 0.963
unhinted 1000 Pcg64 2728 1 2601 3 0.953
windowed 1 Pcg64 5 0 4 0 0.800
windowed 2 Pcg64 17 0 16 0 0.941
windowed 3 Pcg64 13 0 13 0 1.000
windowed 10 Pcg64 28 0 25 0 0.893
windowed 100 Pcg64 178 0 156 0 0.876
windowed 1000 Pcg64 1401 2 1212 1 0.865

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants