This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.2.0
Summary
CUB 1.2.0 adds cub::DeviceReduce::ReduceByKey
and cub::DeviceReduce::RunLengthEncode
and support for CUDA 6.0.
New Features
cub::DeviceReduce::ReduceByKey
.cub::DeviceReduce::RunLengthEncode
.
Other Enhancements
- Improved
cub::DeviceScan
,cub::DeviceSelect
,cub::DevicePartition
performance. - Documentation and testing:
- Added performance-portability plots for many device-wide primitives.
- Explain that iterator (in)compatibilities with CUDA 5.0 (and older) and Thrust 1.6 (and older).
- Revised the operation of temporary tile status bookkeeping for
cub::DeviceScan
(and similar) to be safe for current code run on future platforms (now uses proper fences).
Bug Fixes
- Fix
cub::DeviceScan
bug where Windows alignment disagreements between host and device regarding user-defined data types would corrupt tile status. - Fix
cub::BlockScan
bug where certain exclusive scans on custom data types for theBLOCK_SCAN_WARP_SCANS
variant would return incorrect results for the first thread in the block. - Added workaround to make
cub::TexRefInputIteratorT
work with CUDA 6.0.