|
11 | 11 | ### Features:
|
12 | 12 | ### Bugfixes:
|
13 | 13 |
|
| 14 | +## 1.19.0 (June 18, 2025) |
| 15 | +### Features: |
| 16 | +#### UCP |
| 17 | +* Enabled multi-GPU support within a single process |
| 18 | +* Added dynamic selection between strong and weak fences in RMA flush operations |
| 19 | +* Improved endpoint reconfiguration capabilities |
| 20 | +* Added All2All lane selection for multi-NIC-GPU systems |
| 21 | +* Improved rkey debug info when config cache limit is reached |
| 22 | +* Improved UCP protocol selection based on available memory types |
| 23 | +* Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA) |
| 24 | +* Improved RNDV performance with device-local staging buffers |
| 25 | +* Enabled error handling for RMA get_offload protocols |
| 26 | +#### UCT |
| 27 | +* Defined uct_rkey_unpack_v2 API to support passing sys-dev |
| 28 | +#### RDMA CORE (IB, ROCE, etc.) |
| 29 | +* Added SRD transport support in EFA with reordering, AM, and control operations |
| 30 | +* Removed XGVMI BF2 support (umem) |
| 31 | +* Removed device memory indirect key |
| 32 | +* Fixed VFS objects for DCIs and pools |
| 33 | +* Added routing table cache to the reachability check |
| 34 | +* Fixed strict order usage in IB auxiliary rkeys |
| 35 | +* Improved various init logging messages |
| 36 | +#### CUDA |
| 37 | +* Added multi-context support for remote key unpacking to CUDA IPC |
| 38 | +* Added context switching aware resource management to CUDA IPC |
| 39 | +* Use buffer ID to detect VA recycling in CUDA IPC |
| 40 | +* Added support for allocating CUDA memory on specific system devices |
| 41 | +* Added multi-device support in CUDA copy |
| 42 | +* Improved protocol lane selection for GPU memory operations |
| 43 | +* Relaxed CUDA context requirements in CUDA copy |
| 44 | +* Added deadlock prevention in CUDA copy |
| 45 | +* Added support for address range detection for VMM |
| 46 | +* Enabled memory attributes query after switching CUDA GPU |
| 47 | +* Added multi-GPU send tests for CUDA transports |
| 48 | +* Removed host-to-host performance estimation from CUDA copy transport |
| 49 | +* Replaced cuCtxCreate by cuDevicePrimaryCtxRetain |
| 50 | +* Improved various init logging messages |
| 51 | +#### ROCM |
| 52 | +* Added control parameters for IPC handle cache and signal pool size |
| 53 | +* Optimized ROCm memory type detection with caching |
| 54 | +#### UCS |
| 55 | +* Removed compilation warnings |
| 56 | +#### Tools |
| 57 | +* Added name filter option (-F 'str') to ucx_info for config and feature dumps |
| 58 | +* Improved ucx_info input validation |
| 59 | +### Bugfixes: |
| 60 | +#### UCP |
| 61 | +* Made UCX_TLS=^ib disable all transports including auxiliary |
| 62 | +* Fixed send request status handling |
| 63 | +* Fixed performance degradation in RNDV by optimizing md cache updates |
| 64 | +* Fixed protocol selection when first lane is filtered out by fragment size |
| 65 | +* Fixed rkey selection by using memory registration flag |
| 66 | +#### UCT |
| 67 | +#### RDMA CORE (IB, ROCE, etc.) |
| 68 | +* Improved reliability of DC transport by adding DCI validation and separating connection logic |
| 69 | +* Fixed segfault in DC fence operation |
| 70 | +#### GPU (CUDA, ROCM) |
| 71 | +* Updated ROCm configuration for ROCm 6.3 compatibility |
| 72 | +* Fixed system device detection for CUDA async memory operations |
| 73 | +* Fixed legacy type detection during CUDA IPC mpack |
| 74 | +* Fixed CUDA IPC RMA operations by using correct context for local buffers |
| 75 | +#### UCS |
| 76 | +* Use UCS function for counting leading zeros on x86 architecture |
| 77 | +* Fixed a compilation warning |
| 78 | +#### Shared Memory |
| 79 | +* Fixed FIFO availability check for sm transport |
| 80 | +#### Documentation |
| 81 | +* Fixed open-mpi clone instruction |
| 82 | +#### Build |
| 83 | +* Fixed enum-int-mismatch warnings with GCC 15 |
| 84 | + |
14 | 85 | ## 1.18.0 (January 17, 2025)
|
15 | 86 | ### Features:
|
16 | 87 | #### UCP
|
|
0 commit comments