Skip to content

Commit

Permalink
Unify kernel dispatch paths for device reduce between CUB and c.paral…
Browse files Browse the repository at this point in the history
…lel. (NVIDIA#2591)
  • Loading branch information
griwes authored and pciolkosz committed Oct 25, 2024
1 parent 3848b74 commit 6ba0c8f
Show file tree
Hide file tree
Showing 11 changed files with 488 additions and 301 deletions.
2 changes: 2 additions & 0 deletions c/parallel/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ target_link_libraries(cccl.c.parallel PRIVATE
CUDA::nvJitLink
CUDA::cuda_driver
cccl.compiler_interface_cpp20
CUB::CUB
Thrust::Thrust
)
target_compile_definitions(cccl.c.parallel PUBLIC CCCL_C_EXPERIMENTAL=1)
target_compile_definitions(cccl.c.parallel PRIVATE NVRTC_GET_TYPE_NAME=1)
Expand Down
1 change: 1 addition & 0 deletions c/parallel/include/cccl/c/reduce.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ struct cccl_device_reduce_build_result_t
void* cubin;
size_t cubin_size;
CUlibrary library;
unsigned long long accumulator_size;
CUkernel single_tile_kernel;
CUkernel single_tile_second_kernel;
CUkernel reduction_kernel;
Expand Down
Loading

0 comments on commit 6ba0c8f

Please sign in to comment.