This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.0.1
Summary
CUB 1.0.1 adds cub::DeviceRadixSort
and cub::DeviceScan
. Numerous other performance and correctness fixes and included.
Breaking Changes
- New collective interface idiom (specialize/construct/invoke).
New Features
cub::DeviceRadixSort
. Implements short-circuiting for homogenous digit passes.cub::DeviceScan
. Implements single-pass "adaptive-lookback" strategy.
Other Enhancements
- Significantly improved documentation (with example code snippets).
- More extensive regression test suit for aggressively testing collective variants.
- Allow non-trially-constructed types (previously unions had prevented aliasing temporary storage of those types).
- Improved support for SM3x SHFL (collective ops now use SHFL for types larger than 32 bits).
- Better code generation for 64-bit addressing within
cub::BlockLoad
/cub::BlockStore
. cub::DeviceHistogram
now supports histograms of arbitrary bins.- Updates to accommodate CUDA 5.5 dynamic parallelism.
Bug Fixes
- Workarounds for SM10 codegen issues in uncommonly-used
cub::WarpScan
/cub::WarpReduce
specializations.