Skip to content

Conversation

@RAMitchell
Copy link
Contributor

Fixes the issues described in #7062 (comment) and adds many more tests.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Jan 8, 2026
{
const uint64_t __total_bits = static_cast<uint64_t>(::cuda::std::max(4, ::cuda::std::bit_width(__num_elements)));
const uint64_t __total_bits =
(::cuda::std::max) (uint64_t{8}, static_cast<uint64_t>(::cuda::std::bit_width(__num_elements)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(::cuda::std::max) (uint64_t{8}, static_cast<uint64_t>(::cuda::std::bit_width(__num_elements)));
::cuda::std::max(uint64_t{8}, static_cast<uint64_t>(::cuda::std::bit_width(__num_elements)));

Should be fine now :) you can leave it as is

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Jan 8, 2026
Comment on lines 79 to 82
// Mitchell, Rory, et al. "Bandwidth-optimal random shuffling for GPUs." ACM Transactions on Parallel Computing 9.1
// (2022): 1-20.
uint32_t __L = __val >> __R_bits_;
uint32_t __R = __val & __R_mask_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is this part the actual fix, that we swap the definitions of the original high and low? Otherwise the rest looks the same to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there was a significant bug. The high and low values were being initialised swapped from what they should have been. This meant that with odd number of bits one of the bits was getting removed - as a consequence the tests would run forever because the bijection was incorrect (i.e. values collided).

I wrote it from scratch again to find the bug as I couldn't find it at first! I relabeled everything with the notation from the paper which is less confusing to me.

@miscco
Copy link
Contributor

miscco commented Jan 9, 2026

pre-commit.ci autofix

@RAMitchell
Copy link
Contributor Author

One thing I would also like to try in a subsequent PR is using a splitmix64 (or similar PRNG sequence) inside the round function to generate the key sequence from a 64 bit starting key instead of carrying around 24 registers with the PRNG keys. The philox PRNG just uses a weyl sequence but according to my tests it is not random enough.

@RAMitchell
Copy link
Contributor Author

Further note: number of rounds is set to 24. Anything less than 12 starts to fail these tests. So 24 is probably a good conservative value for now.

@miscco
Copy link
Contributor

miscco commented Jan 9, 2026

/ok to test 013af58

@miscco miscco marked this pull request as ready for review January 9, 2026 11:05
@miscco miscco requested a review from a team as a code owner January 9, 2026 11:05
@miscco miscco requested a review from griwes January 9, 2026 11:05
@miscco miscco enabled auto-merge (squash) January 9, 2026 11:06
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

🥳 CI Workflow Results

🟩 Finished in 59m 28s: Pass: 100%/84 | Total: 11h 33m | Max: 34m 31s | Hits: 99%/197991

See results here.

@miscco miscco merged commit 762e20e into NVIDIA:main Jan 9, 2026
97 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jan 9, 2026
github-actions bot pushed a commit that referenced this pull request Jan 9, 2026
* Fix feistel bijection

* Add loads of tests

* Review comments

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
(cherry picked from commit 762e20e)
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Successfully created backport PR for branch/3.2.x:

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 9, 2026
* Fix feistel bijection

* Add loads of tests

* Review comments

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
(cherry picked from commit 762e20e)
wmaxey pushed a commit that referenced this pull request Jan 9, 2026
…#7147)

* Clean up hierarchy (#7023)

(cherry picked from commit d3db7e0)

* Make c2h vector comparisons `constexpr` (#7009)

(cherry picked from commit 4215bd8)

* Disable cudax with msvc in CI for now (#7139)

(cherry picked from commit 7ed26fc)

* libcu++: silence msvc+nvcc12.9 warning plaguing c.parallel. (#7144)

(cherry picked from commit b98e950)

* Fixes for shuffle_iterator (#7130)

* Fix feistel bijection

* Add loads of tests

* Review comments

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
(cherry picked from commit 762e20e)

* Fix uniformity test

---------

Co-authored-by: pciolkosz <[email protected]>
Co-authored-by: Michał Dominiak <[email protected]>
Co-authored-by: Rory Mitchell <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
leofang added a commit to leofang/cupy that referenced this pull request Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants