Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 1.0.1

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 19 May 07:31

Summary

CUB 1.0.1 adds cub::DeviceRadixSort and cub::DeviceScan. Numerous other performance and correctness fixes and included.

Breaking Changes

  • New collective interface idiom (specialize/construct/invoke).

New Features

  • cub::DeviceRadixSort. Implements short-circuiting for homogenous digit passes.
  • cub::DeviceScan. Implements single-pass "adaptive-lookback" strategy.

Other Enhancements

  • Significantly improved documentation (with example code snippets).
  • More extensive regression test suit for aggressively testing collective variants.
  • Allow non-trially-constructed types (previously unions had prevented aliasing temporary storage of those types).
  • Improved support for SM3x SHFL (collective ops now use SHFL for types larger than 32 bits).
  • Better code generation for 64-bit addressing within cub::BlockLoad/cub::BlockStore.
  • cub::DeviceHistogram now supports histograms of arbitrary bins.
  • Updates to accommodate CUDA 5.5 dynamic parallelism.

Bug Fixes

  • Workarounds for SM10 codegen issues in uncommonly-used cub::WarpScan/cub::WarpReduce specializations.