Skip to content

Releases: ROCm/rocPRIM

rocPRIM-2.10.12 for ROCm 5.0.2

04 Mar 17:53
76ebab2
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.0.2 is unchanged from rocPRIM for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

rocPRIM-2.10.12 for ROCm 5.0.1

16 Feb 22:14
76ebab2
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.0.1 is unchanged from rocPRIM for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

rocPRIM-2.10.12 for ROCm 5.0.0

09 Feb 20:28
76ebab2
Compare
Choose a tag to compare

Fixed

  • Enable bfloat16 tests and reduce threshold for bfloat16
  • Fix device scan limit_size feature
  • Non-optimized builds no longer trigger local memory limit errors

Added

  • Added scan size limit feature
  • Added reduce size limit feature
  • Added transform size limit feature
  • Add block_load_striped and block_store_striped
  • Add gather_to_blocked to gather values from other threads into a blocked arrangement
  • The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config
    • the block sort step supports multiple items per thread

Changed

  • size_limit for scan, reduce and transform can now be set in the config struct instead of a parameter
  • Device_scan and device_segmented_scan: inclusive_scan now uses the input-type as accumulator-type, exclusive_scan uses initial-value-type.
    • This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output).
    • And low-res input with high-res output (e.g. float input, double output)
  • Revert old Fiji workaround, because they solved the issue at compiler side
  • Update README cmake minimum version number
  • Block sort support multiple items per thread
    • currently only powers of two block sizes, and items per threads are supported and only for full blocks
  • Bumped the minimum required version of CMake to 3.16

Known issues

  • Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
  • device_segmented_radix_sort, device_scan unit tests failing for HIP on Windows
  • ReduceEmptyInput cause random faulire with bfloat16

rocPRIM-2.10.11 for ROCm 4.5.2

10 Dec 19:18
fa2d3b8
Compare
Choose a tag to compare

rocPRIM code for ROCm 4.5.2 is unchanged from rocPRIM for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

rocPRIM-2.10.11 for ROCm 4.5.0

27 Oct 21:23
fa2d3b8
Compare
Choose a tag to compare

Addded

  • Code coverage tools build option
  • Address sanitizer build option
  • gfx1030 support added.
  • Experimental HIP-CPU support; build using GCC/Clang/MSVC on Win/Linux. It is work in progress, many algorithms still known to fail.
  • Initial HIP on Windows support. See README for instructions on how to build and install.
  • bfloat16 support added.

Optimizations

  • Added single tile radix sort for smaller sizes.
  • Improved performance for radix sort for larger element sizes.

Changed

  • Package renamed to rocprim-dev for .deb, and to rocprim-devel for .rpm. As rocPRIM is a header-only library, there is no associated runtime package, so for compatibility this development package provides the package rocprim. The provides feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.

Deprecated

  • The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.

rocPRIM-2.10.10 for ROCm 4.3.1

27 Aug 17:40
80cdfaf
Compare
Choose a tag to compare

No changes made for ROCm 4.3.1.

rocPRIM-2.10.10 for ROCm 4.3.0

30 Jul 22:51
80cdfaf
Compare
Choose a tag to compare

Fixed

  • Bugfix & minor performance improvement for merge_sort when input and output storage are the same.

Added

  • gfx90a support added.

Deprecated

  • The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.

rocPRIM-2.10.9 for ROCm 4.2.0

10 May 23:17
Compare
Choose a tag to compare

Fixed

  • Size zero inputs are now properly handled with newer ROCm builds that no longer allow zero-size kernel grid/block dimensions

Changed

  • Minimum cmake version required is now 3.10.2

Known issues

  • Device scan unit test currently failing due to LLVM bug.

rocPRIM-2.10.8 for ROCm 4.1.0

23 Mar 01:18
Compare
Choose a tag to compare

Fixed

  • Texture cache iteration support has been re-enabled.
  • Benchmark builds have been re-enabled.
  • Unique operator no longer called on invalid elements.

Known issues

  • Device scan unit test currently failing due to LLVM bug.

Known Issues

  • None

rocPRIM-2.10.6 for ROCm 4.0.0

18 Dec 15:22
9d47868
Compare
Choose a tag to compare

New Features

  • No new features

Known Issues

  • None