This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.13.0 (NVIDIA HPC SDK 21.7)
CUB 1.13.0 is the major release accompanying the NVIDIA HPC SDK 21.7 release.
Notable new features include support for striped data arrangements in block load/store utilities, bfloat16
radix sort support, and fewer restrictions on offset iterators in segmented device algorithms. Several bugs in cub::BlockShuffle
, cub::BlockDiscontinuity
, and cub::DeviceHistogram
have been addressed. The amount of code generated in cub::DeviceScan
has been greatly reduced, leading to significant compile-time improvements when targeting multiple PTX architectures.
This release also includes several user-contributed documentation fixes that will be reflected in CUB's online documentation in the coming weeks.
Breaking Changes
- #320: Deprecated
cub::TexRefInputIterator<T, UNIQUE_ID>
. Usecub::TexObjInputIterator<T>
as a replacement.
New Features
- #274: Add
BLOCK_LOAD_STRIPED
andBLOCK_STORE_STRIPED
functionality tocub::BlockLoadAlgorithm
andcub::BlockStoreAlgorithm
. Thanks to Matthew Nicely (@mnicely) for this contribution. - #291:
cub::DeviceSegmentedRadixSort
andcub::DeviceSegmentedReduce
now support different types for begin/end offset iterators. Thanks to Sergey Pavlov (@psvvsp) for this contribution. - #306: Add
bfloat16
support tocub::DeviceRadixSort
. Thanks to Xiang Gao (@zasdfgbnm) for this contribution. - #320: Introduce a new
CUB_IGNORE_DEPRECATED_API
macro that disables deprecation warnings on Thrust and CUB APIs.
Bug Fixes
- #277: Fixed sanitizer warnings in
RadixSortScanBinsKernels
. Thanks to Andy Adinets (@canonizer) for this contribution. - #287:
cub::DeviceHistogram
now correctly handles cases whereOffsetT
is not anint
. Thanks to Dominique LaSalle (@nv-dlasalle) for this contribution. - #311: Fixed several bugs and added tests for the
cub::BlockShuffle
collective operations. - #312: Eliminate unnecessary kernel instantiations when compiling
cub::DeviceScan
. Thanks to Elias Stehle (@elstehle) for this contribution. - #319: Fixed out-of-bounds memory access on debugging builds of
cub::BlockDiscontinuity::FlagHeadsAndTails
. - #320: Fixed harmless missing return statement warning in unreachable
cub::TexObjInputIterator
code path.
Other Enhancements
- Several documentation fixes are included in this release.
- #275: Fixed comments describing the
cub::If
andcub::Equals
utilities. Thanks to Rukshan Jayasekara (@rukshan99) for this contribution. - #290: Documented that
cub::DeviceSegmentedReduce
will produce consistent results run-to-run on the same device for pseudo-associated reduction operators. Thanks to Himanshu (@himanshu007-creator) for this contribution. - #298:
CONTRIBUTING.md
now refers to Thrust's build instructions for developer builds, which is the preferred way to build the CUB test harness. Thanks to Xiang Gao (@zasdfgbnm) for contributing. - #301: Expand
cub::DeviceScan
documentation to include in-place support and add tests. Thanks to Xiang Gao (@zasdfgbnm) for this contribution. - #307: Expand
cub::DeviceRadixSort
andcub::BlockRadixSort
documentation to clarify stability, in-place support, and type-specific bitwise transformations. Thanks to Himanshu (@himanshu007-creator) for contributing. - #316: Move
WARP_TIME_SLICING
documentation to the correct location. Thanks to Peter Han (@Peter9606) for this contribution. - #321: Update URLs from deprecated github.com to preferred github.io. Thanks to Lilo Huang (@lilohuang) for this contribution.
- #275: Fixed comments describing the