Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 1.2.0

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 19 May 07:32

Summary

CUB 1.2.0 adds cub::DeviceReduce::ReduceByKey and cub::DeviceReduce::RunLengthEncode and support for CUDA 6.0.

New Features

  • cub::DeviceReduce::ReduceByKey.
  • cub::DeviceReduce::RunLengthEncode.

Other Enhancements

  • Improved cub::DeviceScan, cub::DeviceSelect, cub::DevicePartition performance.
  • Documentation and testing:
    • Added performance-portability plots for many device-wide primitives.
    • Explain that iterator (in)compatibilities with CUDA 5.0 (and older) and Thrust 1.6 (and older).
  • Revised the operation of temporary tile status bookkeeping for cub::DeviceScan (and similar) to be safe for current code run on future platforms (now uses proper fences).

Bug Fixes

  • Fix cub::DeviceScan bug where Windows alignment disagreements between host and device regarding user-defined data types would corrupt tile status.
  • Fix cub::BlockScan bug where certain exclusive scans on custom data types for the BLOCK_SCAN_WARP_SCANS variant would return incorrect results for the first thread in the block.
  • Added workaround to make cub::TexRefInputIteratorT work with CUDA 6.0.