Fix BlockScan accumulator type handling #6443

Aminsed · 2025-11-02T17:27:36Z

Summary

keep ThreadReduce accumulator types pinned to the block value type across BlockScan and BlockReduce
apply the same accumulator fix to the raking specialization so all paths use the intended type
add a regression test that exercises BlockScan with a functor returning a wider type

Motivation

#5668 shows that BlockScan widens the accumulator when the scan functor returns a wider type than the block value. That implicit widening breaks user code that relies on the original type and can even hit deleted overloads.

Explanation

ThreadReduce was deducing its accumulator type from the functor instead of the block value T. The patch explicitly instantiates ThreadReduce with AccumT = T everywhere BlockScan and BlockReduce dispatch through it, including the raking specialization. The new unit test exercises an operator that returns long long for int inputs and verifies the accumulator remains int.

Rationale

Minimal surface area: the change touches only the ThreadReduce call sites; public APIs and template parameters stay the same.
Consistent behavior: every BlockScan reduction path now uses the same accumulator type, avoiding divergent code paths.
Regression coverage: the new Catch2 test guards against future regressions triggered by wider returning ops.

Testing

pre-commit run --files cub/cub/block/block_scan.cuh cub/cub/block/block_reduce.cuh cub/cub/block/specializations/block_reduce_raking_commutative_only.cuh cub/test/catch2_test_block_scan.cu

copy-pr-bot · 2025-11-02T17:27:40Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

fbusato · 2025-11-03T20:38:08Z

cub/cub/block/block_reduce.cuh

    // Reduce partials
-    T partial = cub::ThreadReduce(inputs, reduction_op);
+    T partial =
+      cub::ThreadReduce<::cuda::std::remove_reference_t<decltype(inputs)>, ReductionOp, T, T>(inputs, reduction_op);


this looks a partial solution. It could regress for small integer types. For example, reduction/scan over int8_t. It is better to perform the computation with 32-bit and cast back at the end

Good catch. I dropped the explicit T and taught ThreadReduce to keep its __accumulator_t promotion, so int8_t still widens to 32-bit.

fbusato · 2025-11-03T20:38:51Z

cub/cub/block/block_reduce.cuh

  {
    // Reduce partials
-    T partial = cub::ThreadReduce(inputs, ::cuda::std::plus<>{});
+    T partial = cub::ThreadReduce<::cuda::std::remove_reference_t<decltype(inputs)>, ::cuda::std::plus<>, T, T>(


nit. please isolate the first template parameter with using to improve readability

@fbusato Thanks! I reverted that spot to plain ThreadReduce(inputs, …), so there’s nothing left to alias. Let me know if you’d still like a using helper there.

Aminsed requested review from a team as code owners November 2, 2025 17:27

github-project-automation bot added this to CCCL Nov 2, 2025

Aminsed requested review from fbusato and pciolkosz November 2, 2025 17:27

github-project-automation bot moved this to Todo in CCCL Nov 2, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 2, 2025

Aminsed force-pushed the fix-blockscan-accum branch from bc27d15 to c122fad Compare November 2, 2025 17:31

fbusato requested changes Nov 3, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Nov 3, 2025

Aminsed added 2 commits November 3, 2025 21:50

Fix BlockScan accumulator type handling

f20c98a

Fix ThreadReduce accumulator dispatch

0bcd084

Aminsed force-pushed the fix-blockscan-accum branch from 85c2484 to 0bcd084 Compare November 4, 2025 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix BlockScan accumulator type handling #6443

Fix BlockScan accumulator type handling #6443

Aminsed commented Nov 2, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 2, 2025

Uh oh!

fbusato Nov 3, 2025

Uh oh!

Aminsed Nov 4, 2025

Uh oh!

fbusato Nov 3, 2025

Uh oh!

Aminsed Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix BlockScan accumulator type handling #6443

Are you sure you want to change the base?

Fix BlockScan accumulator type handling #6443

Conversation

Aminsed commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Explanation

Rationale

Testing

Uh oh!

copy-pr-bot bot commented Nov 2, 2025

Uh oh!

fbusato Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Aminsed Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fbusato Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Aminsed Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aminsed commented Nov 2, 2025 •

edited

Loading