cub, c.parallel: {lower,upper}_bound #7007

griwes · 2025-12-18T06:13:56Z

Description

This PR adds {lower,upper}_bound device algorithms to both CUB and c.parallel.

In the case of CUB, the implementation is very straightforward and directly follows current implementation in Thrust (which I have cleaned up as a drive by change).

In the case of c.parallel, because of how CUB's for_each passes in kernel arguments, repeating the slight madness of the current for operator construction for the for_each algorithm itself felt beyond annoying, but I needed a kernel pointer; so instead of reusing the kernels available in CUB, I adapted the static-block-size for_each kernel to accept all the necessary arguments as separate kernel arguments, then construct the for operator expected by CUB inside the kernel, and finally invoke the CUB for_each agent with that operator. So, it's a manually constructed kernel, but it reuses both the agent code and binary search helper types from CUB.

Resolves #6695

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

…arch

shwina · 2025-12-18T17:58:23Z

c/parallel/src/binary_search.cu

+  const unsigned int thread_count = 256;
+  const size_t items_per_block    = 512;


Are these essentially hardcoded somewhere in the CUB as well?

Yes. There's no tunings for for, and this just follows that.

I'm planning to start working on a warp level binary search algorithm in the new year, and then build a device wide one on top of that, as a replacement of the current approach - we'll do actual tunings then.

…arch

cub/cub/device/device_merge_sort.cuh

cub/cub/detail/binary_search_helpers.cuh

cub/cub/device/device_find.cuh

docs/cub/Doxyfile

shwina

The C side looks good to me. Thanks!

c/parallel/src/binary_search.cu

…arch

NaderAlAwar · 2026-01-06T14:56:41Z

c/parallel/src/binary_search.cu

+  {
+    pushed = try_push_context();
+    auto exec_status =
+      Invoke(d_data, num_items, d_values, num_values, d_out, op, build.cc, (CUfunction) build.kernel, stream);


Suggestion: can we use static_cast here? If not, maybe reinterpret_cast.

This mirrors what for does, I can go clean it up a little in both places later but I'd rather not multiply the drive by fixes here.

…arch

github-actions · 2026-01-12T15:11:42Z

🥳 CI Workflow Results

🟩 Finished in 6h 50m: Pass: 100%/133 | Total: 6d 05h | Max: 5h 37m | Hits: 71%/177855

See results here.

griwes added 3 commits December 4, 2025 19:38

cub: add DeviceFind::{Lower,Upper}Bound.

2391c39

c.parallel: add binary_search with {lower,upper}_bound modes.

83e8b0d

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

f535044

…arch

griwes requested review from a team as code owners December 18, 2025 06:13

github-project-automation bot added this to CCCL Dec 18, 2025

griwes requested review from NaderAlAwar and alliepiper December 18, 2025 06:13

github-project-automation bot moved this to Todo in CCCL Dec 18, 2025

griwes requested a review from elstehle December 18, 2025 06:13

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 18, 2025

...whoops.

a192645

griwes mentioned this pull request Dec 18, 2025

Provide lower_bound in cuda.compute #6688

Open

This comment has been minimized.

Sign in to view

griwes added 2 commits December 18, 2025 02:02

...whoops, fix part 2.

1e6c658

Maybe fix the MSVC warning?

d035343

griwes mentioned this pull request Dec 18, 2025

c.parallel: refactor storage_t handling in jit templates #7010

Open

Back out the changes to iterator jit templates.

3112932

This comment has been minimized.

Sign in to view

shwina reviewed Dec 18, 2025

View reviewed changes

griwes added 4 commits December 18, 2025 22:22

Fix C++17 aggregate init.

abe521d

...make the large problem size tests actually use the large size...

eeb6478

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

993cd94

…arch

Widen the type of the size argument in tests.

bd300a3

bernhardmgruber reviewed Dec 19, 2025

View reviewed changes

cub/cub/device/device_merge_sort.cuh Outdated Show resolved Hide resolved

cub/cub/detail/binary_search_helpers.cuh Outdated Show resolved Hide resolved

cub/cub/device/device_find.cuh Show resolved Hide resolved

docs/cub/Doxyfile Outdated Show resolved Hide resolved

shwina approved these changes Dec 19, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

oleksandr-pavlyk reviewed Dec 19, 2025

View reviewed changes

c/parallel/src/binary_search.cu Outdated Show resolved Hide resolved

griwes added 2 commits December 19, 2025 14:49

Address review comments.

39e927e

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

e831978

…arch

griwes enabled auto-merge (squash) December 19, 2025 22:51

This comment has been minimized.

Sign in to view

Merge branch 'main' into feature/cub-binary-search

cef7127

This comment has been minimized.

Sign in to view

NaderAlAwar approved these changes Jan 6, 2026

View reviewed changes

Merge branch 'main' into feature/cub-binary-search

fb58fcd

This comment has been minimized.

Sign in to view

griwes added 2 commits January 9, 2026 01:13

Merge branch 'main' into feature/cub-binary-search

3adde54

Merge branch 'main' into feature/cub-binary-search

e90603a

griwes disabled auto-merge January 9, 2026 10:25

This comment has been minimized.

Sign in to view

griwes added 2 commits January 12, 2026 00:01

cub: move DeviceFind::FindIf to device_find.cuh.

3605e54

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

58e212c

…arch

griwes requested a review from a team as a code owner January 12, 2026 08:02

griwes enabled auto-merge (squash) January 12, 2026 08:03

...fix a bad sed.

f555336

This comment has been minimized.

Sign in to view

bernhardmgruber approved these changes Jan 13, 2026

View reviewed changes

griwes merged commit d12fbd9 into NVIDIA:main Jan 13, 2026
286 of 289 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Jan 13, 2026

griwes deleted the feature/cub-binary-search branch January 13, 2026 20:26

		const unsigned int thread_count = 256;
		const size_t items_per_block = 512;

cub, c.parallel: {lower,upper}_bound #7007

cub, c.parallel: {lower,upper}_bound #7007

Uh oh!

Conversation

griwes commented Dec 18, 2025

Description

Checklist

Uh oh!

This comment has been minimized.

This comment has been minimized.

shwina Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

griwes Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shwina left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

NaderAlAwar Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

griwes Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 12, 2026

🥳 CI Workflow Results

🟩 Finished in 6h 50m: Pass: 100%/133 | Total: 6d 05h | Max: 5h 37m | Hits: 71%/177855

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants