v2024.02.0
This release contains several RAJA improvements and submodule updates.
Please download the RAJA-v2024.02.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.
Notable changes include:
-
New features / API changes:
- BREAKING CHANGE (ALMOST): The
loop_exec
and associated policies such asloop_atomic
,loop_reduce
, etc. were deprecated in the v2023.06.0 release (please see the release notes for that version for details). Users should replace these withseq_exec
and associated policies for sequential CPU execution. The code behavior will be identical to what you observed withloop_exec
, etc. However, due to a request from some users with special circumstances, theloop_*
policies still exist in this release as type aliases to theirseq_*
analogues. Theloop_*
policies will be removed in a future release. - BREAKING CHANGE: RAJA TBB back-end support has been removed. It was not feature complete and the TBB API has changed so that the code no longer compiles with newer Intel compilers. Since we know of no project that depends on it, we have removed it.
- An
IndexLayout
concept was added, which allows for accessing elements of a RAJAView
via a collection of indicies and use a different indexing strategy along different dimensions of a multi-dimensionalView
. Please the RAJA User Guide for more information. - Add support for SYCL reductions using the new RAJA reduction API.
- Add support for new reduction API for all back-ends in RAJA::launch.
- BREAKING CHANGE (ALMOST): The
-
Build changes/improvements:
- Update BLT submodule to v0.6.1 and incorporate its new macros for managing TPL targets in CMake.
- Update camp submodule to v2024.02.0, which contains changes to support ROCm 6.x compilers.
- Update desul submodule to afbd448.
- Replace internal use of HIP and CUDA platform macros to their newer versions to support latest compilers.
-
Bug fixes/improvements:
- Change internal memory allocation for HIP to use coarse-grained pinned memory, which improves performance because it can be cached on a device.
- Fix compilation error resulting from incorrect namespacing of OpenMP execution policy.
- Several fixes to internal implementation of Reducers and Operators.