This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.9.9 (CUDA 11.0)
CUB 1.9.9 (CUDA 11.0)
Summary
CUB 1.9.9 is the release accompanying the CUDA Toolkit 11.0 release. It introduces CMake support, version macros, platform detection machinery, and support for NVC++, which uses Thrust (and thus CUB) to implement GPU-accelerated C++17 Parallel Algorithms. Additionally, the scan dispatch layer was refactored and modernized. C++03, C++11, GCC < 5, Clang < 6, and MSVC < 2017 are now deprecated. Starting with the upcoming 1.10.0 release, C++03 support will be dropped entirely.
Breaking Changes
- Thrust now checks that it is compatible with the version of CUB found in your include path, generating an error if it is not. If you are using your own version of CUB, it may be too old. It is recommended to simply delete your own version of CUB and use the version of CUB that comes with Thrust.
- C++03 and C++11 are deprecated. Using these dialects will generate a compile-time warning. These warnings can be suppressed by defining
CUB_IGNORE_DEPRECATED_CPP_DIALECT
(to suppress C++03 and C++11 deprecation warnings) orCUB_IGNORE_DEPRECATED_CPP11
(to suppress C++11 deprecation warnings). Suppression is only a short term solution. We will be dropping support for C++03 in the 1.10.0 release and C++11 in the near future. - GCC < 5, Clang < 6, and MSVC < 2017 are deprecated. Using these compilers will generate a compile-time warning. These warnings can be suppressed by defining
CUB_IGNORE_DEPRECATED_COMPILER
. Suppression is only a short term solution. We will be dropping support for these compilers in the near future.
New Features
- CMake support. Thanks to Francis Lemaire for this contribution.
- Refactorized and modernized scan dispatch layer. Thanks to Francis Lemaire for this contribution.
- Policy hooks for device-wide reduce, scan, and radix sort facilities to simplify tuning and allow users to provide custom policies. Thanks to Francis Lemaire for this contribution.
<cub/version.cuh>
:CUB_VERSION
,CUB_VERSION_MAJOR
,CUB_VERISON_MINOR
,CUB_VERSION_SUBMINOR
, andCUB_PATCH_NUMBER
.- Platform detection machinery:
<cub/util_cpp_dialect.cuh>
: Detects the C++ standard dialect.<cub/util_compiler.cuh>
: host and device compiler detection.<cub/util_deprecated.cuh>
:CUB_DEPRECATED
.- <cub/config.cuh>
: Includes
<cub/util_arch.cuh>,
<cub/util_compiler.cuh>,
<cub/util_cpp_dialect.cuh>,
<cub/util_deprecated.cuh>,
<cub/util_macro.cuh>,
<cub/util_namespace.cuh>`
cub::DeviceCount
andcub::DeviceCountUncached
, caching abstractions forcudaGetDeviceCount
.
Other Enhancements
- Lazily initialize the per-device CUDAattribute caches, because CUDA context creation is expensive and adds up with large CUDA binaries on machines with many GPUs. Thanks to the NVIDIA PyTorch team for bringing this to our attention.
- Make
cub::SwitchDevice
avoid setting/resetting the device if the current device is the same as the target device.
Bug Fixes
- Add explicit failure parameter to CAS in the CUB attribute cache to workaround a GCC 4.8 bug.
- Revert a change in reductions that changed the signedness of the
lane_id
variable to suppress a warning, as this introduces a bug in optimized device code. - Fix initialization in
cub::ExclusiveSum
. Thanks to Conor Hoekstra for this contribution. - Fix initialization of the
std::array
in the CUB attribute cache. - Fix
-Wsign-compare
warnings. Thanks to Elias Stehle for this contribution. - Fix
test_block_reduce.cu
to build without parameters. Thanks to Francis Lemaire for this contribution. - Add missing includes to
grid_even_share.cuh
. Thanks to Francis Lemaire for this contribution. - Add missing includes to
thread_search.cuh
. Thanks to Francis Lemaire for this contribution. - Add missing includes to
cub.cuh
. Thanks to Felix Kallenborn for this contribution.