This release of Umpire contains the following build requirement changes:
- Umpire now requires C++17 or later.
-
Added
DeviceIpcAllocatorfor GPU memory sharing between processes. -
Added HIP support to
AllocationAdvisor. -
Added HIP support for
get_device_mem_usagefunction. -
Extended Fortran API support:
- Added
AlignedAllocatorto Fortran API. - Added
SizeLimiterto Fortran API. - Added other strategies to Fortran API.
- Added logical allocation routines to Fortran API.
- Extended FORTRAN allocation support to 7D.
- Added 5D Fortran allocation routines.
- Added
-
Allow both shared memory implementations to be used at the same time.
-
Fixed a communicator leak in
HostMpi3SharedMemoryResource. -
Fixed handling of the CUDA language standard.
-
Fixed HIP constant memory implementation.
-
Added a ResourceAwarePool which allows memory from the same pool to be used (1) across multiple device streams and (2) in a single memory space environment without the potential to cause data races.
-
Added MPI3 Shared Memory Allocators which uses MPI3 capabilities for Shared Memory.
-
Added a
NamingShimstrategy to allow users to allocate IPC shared memory without providing a name.
- Fixed a minor memory leak when using IPC Shared Memory Allocators.
-
A
get_total_bytes_allocatedfunction was implemented which returns the total amount of bytes allocated with Umpire allocators. -
The
NamedAllocationStrategycan now be used with IPC Shared Memory Allocators. -
Additional documentation for Shared Memory Allocators was created and reorganized.
-
Additional documentation on requirements for Windows builds was added to the cmake.
This release of Umpire contains new build requirements including:
- Cmake version 3.23 or later is required.
- Umpire uses Fortran_FORMAT to avoid compilation errors when using LLVM flang.
- SYCL and Intel builds were added to Umpire`s DockerFile in addition to other builds in the Github workflow for better testing.
- Made fmt header-only by default allowing users to define preferred fmt target using UMPIRE_FMT_TARGET variable.
- Add
_hwmvariant of pool heuristic functions. These reallocate to the high watermark value after a coalescing, and can reduce memory overhead.
-
Support 2023 OneAPI release.
-
Add HIP support to the DeviceAllocator.
-
Prefix fmt macros to avoid conflicts with other libraries including fmt.
-
Replace deprecated random_shuffle usage with shuffle for c++17.
-
Make exported include directories relative to install prefix.
-
New HIP Advise operations have been added for setting and unsetting of the
READ_MOSTLY,PREFERRED_LOCATION, andACCESSED_BYadvice.For HIP versions >= 5, operations to set and unset
COARSE_GRAINhas also been added. -
getCurrentSizeandgetTotalSizemethods were added to theDeviceAllocatorAPI. -
New event tracking has been added that can stream events to JSON (for replays) and SQLITE.
-
UMAPallocation resource has been added.
- Using
try_getfor async device operations.
-
Fixed build problem for Fortan builds by properly installing the Umpire module files.
-
Fixed build problem on certain configurations where the
std::filesystemcheck was returning incorrect results. -
Instead of throwing an exception, the
is_device_allocatorhelper function now returnsfalseif theDeviceAllocatorobject has not yet been instantiated. -
Fixed
FixedMallocPoolon Windows, allowing theQuickPoolallocator to be safely copied.
-
Add
UMPIRE_DISABLE_ALLOCATIONMAP_DEBUG(default isOFF) option that allows users to disable the AllocationMap from dumping all records when an allocation is not found in a debug build. -
Using
UMPIRE_ENABLE_MPIinstead ofENABLE_MPIfor Umpire-specific MPI capabilities -
Using
UMPIRE_ENABLE_IPC_SHARED_MEMORYinstead ofENABLE_IPC_SHARED_MEMORYto indicate this is an Umpire-specific implementation feature.
This is a patch release of v2022.03 that fixes reported build errors by setting UMPIRE_ENABLE_DOCS back to OFF
by default since building documentation sets requires additional tools to build properly.
This release of Umpire contains new build requirements including:
- C++14 is now required to build Umpire
- CMake version 3.14 or later is required
- The CMake object library for
C/FORTRANinterface has been reorganized. (NOTE: This is a breaking change since the include paths are now different.)
- Added a
getDeviceAllocatorfunction that allows users to get aDeviceAllocatorobject from the kernel without explicitly passing the allocator to the kernel first. - Added a
resetfunction to theDeviceAllocatorso that old data can be rewritten. - Expose
PREFETCHoperations registered with theMemoryOperationRegistrywith a newResourceManager::prefetchmethod.
The following functions previously marked as deprecated have now been removed:
DynamicPoolMapandDynamicPoolaliases removedregisterAllocatorandisAllocatorRegisteredremoved
- Fixed a cmake install config issue so that now users can find a package of Umpire with a version constraint.
- Fix
ResourceManager::isAllocatorto work for resources - Fix comparison operators for
TypedAllocators - Fix host and device Allocator ID overlap
- Remove null and zero-byte pool from list of valid allocators
- The
UMPIRE_ENABLE_DEVICE_ALLOCATORoption was added to control whether or not the DeviceAllocator class is included in the library. The default is "Off".
C/FORTRANAPI is now auto generated- The
umpire-config.cmakepackage is now relocatable - Use
bltnamespace for hip targets - Umpire
CMakeListoptions now haveUMPIRE_prefixes and are now dependent upon correspondingBLToptions. - Removed hardcoded
-Xcompiler -mno-float128for GCC 8+ with CUDA on PowerPC. - Build Doxygen documentation on ReadTheDocs.
- Add CI job with interprocess shared memory and CUDA
- Add CI containers to allow for gcc{7,8,9}, clang{11,12}, and nvcc{10,11}
- Add CI to check pools work with
DEVICE_CONSTmemory
Added documentation on allocator (in)accessibility as well as getAllocator usage.
Added a Release function to FixedPool and corresponding gtest in strategy_tests
Installed thirdparty exports in CMake configuration file
Replay will now display high water mark statistics per allocator.
Initial support for IPC Shared Memory via a "SHARED" resource allocator. IPC Shared memory is initially available on the Host resource and will default to the value of ENABLE_MPI.
Added get_communicator_for_allocator to get an MPI Communicator for the scope of a shared allocator.
Added Allocator::getStrategyName() to get name of the strategy used.
Added getActualHighwatermark to all pool strategies, returns the high water value of getActualSize.
Added umpire::mark_event() to mark an event during Umpire lifecycle
Added asynchronous memset and reallocate operations for CUDA and HIP.
Added support for named allocations.
DynamicPoolMap marked deprecated. QuickPool should be used instead.
Refactored pool coalesce heuristic API to return either 0 or the minimum pool size to allocate when a coalesce is to be performed. No functional change yet.
All asynchronous operations now return a camp::resources::EventProxy to avoid the overhead of creating Events when they are unused.
Removed all internal tracking, allocations are only tracked at the Allocator level.
- Fixed bug where zero-byte allocations from Umpire were sometimes incorrectly reported as not being Umpire allocations
-
Memory Resource header and source files for HIP
-
Unified Memory support for HIP, including testing and benchmarking (temp support for Fortran).
-
Added a getParent functionality for retrieving the memory resource of an allocator.
-
Changed enumeration names from all upper case to all lower case in order to avoid name collisions.
-
Fixed up broken source links in tutorial documentation.
-
registerAllocator is deprecated, addAlias should be used instead.
-
Moved backend-specific resource code out of ResourceManager and into resource::MemoryResourceRegistry.
-
Fixed accounting for number of releasable bytes in Quickpool that was causing coalesce operations to not work properly.
- Added workaround for incorrect nvcc compiler warning: "warning: missing return statement at end of non-void function" occuring in one Umpire's header files.
-
Fixed DynamicPoolMap deallocate to make coalesce check O(1) again.
-
Initialize m_default_allocator to HOST if not set explicitly.
-
QuickPool available via the C & Fortran APIs.
-
Resources are now created on-demand when accessed for the first time.
-
Peer access is no longer automatically enabled for CUDA and HIP.
-
Added cmake check to deterime if build subsystem capable of ASAN.
-
Fixed ASAN poisoning to limit it to what user originally requested and not rounded amount.
-
Improved resilliance of primary pool destructors so that giving back previously allocated blocks to a device that has already been cleaned up will no longer throw an error, but instead will now be logged and ignored.
-
Fixed Umpire builds with MPI enabled
-
Added missing wrapUmpire.hpp to installation directory
-
Added a FILE memory resource that allocates memory using mmap'd files. This can be used to allocate memory from the burst buffers on machines like Sierra and Lassen.
-
All pools now have an "alignment" parameter that can be provided to the constructor.
-
MemoryResourceTraits now includes a
resourcemember that can be used to indentify the underlying resource for any Allocator. -
Bundled tpl cxxopts has been replaced by CLI11 (only used when ENABLE_TOOLS=On)
-
Fixed memory leaks in DynamicPoolList, QuickPool.
-
Fixed reallocate operation when called on an allocation from a pool.
-
Added support for multiple GPU devices, detected and registered as "DEVICE_N" where N is the device number.
-
Added support for capturing function backtraces with allocations.
-
Added
AlignedAllocatorto provide aligned allocations for host memory. -
Fixed builds using
-stdlib=c++ -
Switched to camp::Platform: Platform::cpu is now Platform::host
-
Fixes a bug when calling reallocate with size 0.
-
Replay tool now supports replaying reallocate operations.
-
ENABLE_DEVICE_CONST CMake option to control whether device constant memory is enabled. It is now disabled by default.
-
DeviceAllocator that provides a pool for allocations inside GPU kernels.
-
Added "unset" operations for removing CUDA memory advice.
-
Extended C/Fortran API with more allocation strategies.
-
NamedAllocator that allows creating a new allocator that passes allocations through to underlying strategy
-
UMPIRE_VERSION_X are now defined as macros, rather than constexpr variables
-
Fixed reallocate to properly handle case where size == 0
-
AllocationStrategy constructor parameters re-ordered for consistency
-
Added symbol
umpire_ver_1_detectedto help detect version mismatches when linking multiple libraries that all use Umpire. -
Re-introduced pool algorithm used in pre-1.0.0 releases as
DynamicPoolList, and renamed current strategy toDynamicPoolMap.DynamicPoolis now an alias toDynamicPoolMap. -
Fix signature of C function
umpire_resourcemanager_make_allocator_poolto takesize_tnotint. -
Restored
getActualSizefor allAllocatortypes
- Fixed a bug in DynamicPool where memory could be leaked when allocating a new block using the "minimum size" for an allocation smaller than the block.
-
Umpire is MPI-aware (outputs rank information to logs and replays) when configured with the option ENABLE_MPI=On, and umpire::initialize(MPI_Comm comm) must be called.
-
AllocationStrategies may be wrapped with multiple extra layers. To "unwrap" an Allocator to a specific strategy, the umpire::util::unwrap_allocator method can be used, for example:
auto dynamic_pool = umpire::util::unwrap_allocatorumpire::strategy::DynamicPool(allocator);
This will impact users who have been using DynamicPool::coalesce. The cookbook recipe has been updated accordingly, and the previous code snippet can be used.
-
Umpire now directs log and replay output to files, one per process. The filenames can be controlled by the environment variable UMPIRE_OUTPUT_BASENAME
-
ENABLE_CUDA now set to Off by default.
-
Allocations for 0 bytes now always return a valid pointer that cannot be read or written. These pointers can be deallocated.