UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

tvegas1 · 2024-12-20T10:39:02Z

What?

Do not use cuCtxSetFlags() if CUDA driver does not support it.

Why?

Unresolved symbol for cuCtxSetFlags on CUDA driver < 12.1 causes crash.

How?

Assumptions:

cuCtxSetFlags is only needed for VMM, which has UCX support starting from CUDA driver >= 12.3
cuCtxSetFlags is not strictly needed for malloc async

Testing

Locally tested, needs final testing on platform with actual older drivers.

UCX_IB_GPU_DIRECT_RDMA=no ./rfs/bin/ucx_perftest -t tag_bw -m cuda

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2024-12-20T17:18:00Z

we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage?

tvegas1 · 2025-01-06T11:10:35Z

@yosefe, do we need this before release?

tvegas1 · 2025-01-06T13:44:29Z

we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage?

I think it is difficult because we need to build with later driver version and run it with older driver version. But for instance, when I run this container on rock, we are only running later driver version, and I don't think we can easily switch driver version since it has to match kernel module as per my understanding.

root@905eb7691066:/# readelf -a /usr/lib/x86_64-linux-gnu/libcuda.so | grep -w cuCtxSetFlags
   731: 00000000002516f0    30 FUNC    GLOBAL DEFAULT   13 cuCtxSetFlags

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2024-12-22T10:47:20Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

    }
 #else
-    unsigned value = 1;
+    (void)ctx_set_flags_func;


why needed?
maybe we could just remove #if HAVE_CUDA_FABRIC now, since we don't use cuCtxSetFlags directly?

restored as it is needed by CU_CTX_SYNC_MEMOPS

yosefe · 2024-12-22T10:48:31Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+{
+    static ucs_status_t status = UCS_ERR_LAST;
+
+#if CUDA_VERSION >= 12000


why needed?

cuGetProcAddress() prototype changed at >=12000 and we know that cuCtxSetFlags() also appeared after 12000 so no need to use older cuGetProcAddress() prototype to check.

yosefe · 2024-12-22T10:51:26Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -823,6 +834,37 @@ static uct_md_ops_t md_ops = {
    .detect_memory_type = uct_cuda_copy_md_detect_memory_type
 };

+static ucs_status_t uct_cuda_copy_md_check_is_ctx_set_flags_supported(void)


To simplify the code, we could have this function call the needed function pointer, and move the global var inside it.
Something like
ucs_status_t uct_cuda_copy_set_ctx_flags(unsigned flags)
and have it return UCS_ERR_UNSUPPORTED if the func pointer is not found.

I thought about it but went for two step approach as we need:

disable fabric at init time

set the flag with md and address as parameter, in case we cannot use cuCtxSetFlags()

src/uct/cuda/cuda_copy/cuda_copy_md.c

brminich · 2025-01-09T09:24:19Z

@yosefe, pls review

yosefe · 2025-01-09T17:03:49Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+        }
+
+        ucs_diag("disabled fabric memory allocations");
+        md->config.enable_fabric = UCS_NO;


looks like it affects only cuda_copy memory allocations, but what happens if we get a fabric memory from user buffer and then we don't actually set sync memops for it?
we could return UNSUPPORTED from uct_cuda_copy_sync_memops and if not - return error from cuda memory detection

this should show now be handled right?

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2025-01-13T11:44:20Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

    CUdriverProcAddressQueryResult sym_status;
    CUresult cu_err;
    ucs_status_t status;
+    uct_cuda_cuCtxSetFlags_t cuda_cuCtxSetFlags_func =


initialized vars should be first

should be static??

thanks missed the static

yosefe · 2025-01-13T11:46:33Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -553,8 +554,7 @@ static void uct_cuda_copy_sync_memops(uct_cuda_copy_md_t *md,

    if (is_vmm) {
        ucs_fatal("failed to set sync_memops on CUDA VMM without "
-                  "cuCtxSetFlags() (address=%p)",
-                  address);
+                  "cuCtxSetFlags() (address=%p)", address);


Thinking of it again it should be a warning, since failure in cuPointerSetAttribute() call is also a warning

so when is_vmm == 1 you want to call cuPointerSetAttribute and let it fail right?

moved to ucs_warn

hmm right, actually we can return from the function after ucs_warn, and not call cuPointerSetAttribute at all

src/uct/cuda/cuda_copy/cuda_copy_md.c

Akshay-Venkatesh · 2025-01-13T16:46:51Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+    }
+
+    if (is_vmm) {
+        ucs_warn("failed to set sync_memops on CUDA VMM without "


@tvegas1 Current changes look good to me but @yosefe brought up an issue where library is built with >=12.3 compatible driver version but the system where that library gets used has driver version < 12.1. On such a system, VMM/Mallocasync allocations are allowed (as VMM and MallocAsync is supported on driver versions < 12.1). But there would be a need to report an error or fail even if UCX isn't compiled with HAVE_CUDA_FABRIC (driver version >= 12.3). The condition met here is when UCX is built with >=12.3 driver.

agree, that's where we have VMM independently allocated, but still we don't have HAVE_FABRIC set, in this case I will move is_vmm out of the #ifdef and fatal if is_vmm == 1.

i don't think it will help - if HAVE_FABRIC is not set we will never know in UCX it is VMM memory and assume it is legacy memory. Then we can only hope that cuPointerSetAttribute would fail.

yes also enabled detect vmm because:

cuMemRelease >= 10.2

cuMemRetainAllocationHandle >= 11.0

cuMemGetAllocationPropertiesFromHandle >= 10.2

assuming we built UCX with cuda >= 11 anyways

let me know if you read it differently: https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Driver_API_functions_supported_by_HIP.html

double checked actual function prototype with cuda older release online documentation

CI failure on cuda11.0, seems although cuMemRetainAllocationHandle is found in headers in /usr/local/cuda-11/include/, it is not availalbe in the link time cuda stub library.

yosefe · 2025-01-14T17:57:57Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -469,6 +470,7 @@ static int uct_cuda_copy_detect_vmm(void *address,
                                    ucs_memory_type_t *vmm_mem_type,
                                    CUdevice *cuda_device)
 {
+#if HAVE_CUMEMRETAINALLOCATIONHANDLE


minor : usually prefer ifdef over if

actually restored because to if's, seems it's causing issue when #define HAVE_DECL_CU_MEM_LOCATION_TYPE_HOST 0

lets do it only for HAVE_CUMEMRETAINALLOCATIONHANDLE

yosefe · 2025-01-16T07:51:21Z

@tvegas1 can you pls port to v1.18.x

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from 1ce967f to 68a5f51 Compare December 20, 2024 10:46

tvegas1 requested review from Akshay-Venkatesh and yosefe December 20, 2024 10:47

rakhmets reviewed Dec 20, 2024

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

rakhmets reviewed Dec 20, 2024

View reviewed changes

yosefe reviewed Jan 6, 2025

View reviewed changes

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch 3 times, most recently from 7acee45 to ff4313c Compare January 7, 2025 09:07

brminich previously approved these changes Jan 7, 2025

View reviewed changes

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

tvegas1 dismissed brminich’s stale review via 2f5e5a5 January 7, 2025 11:25

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

rakhmets previously approved these changes Jan 7, 2025

View reviewed changes

Akshay-Venkatesh approved these changes Jan 7, 2025

View reviewed changes

brminich previously approved these changes Jan 8, 2025

View reviewed changes

yosefe reviewed Jan 10, 2025

View reviewed changes

tvegas1 dismissed stale reviews from brminich and rakhmets via 03094e5 January 10, 2025 10:21

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch 2 times, most recently from da07d62 to 8657d54 Compare January 10, 2025 10:33

yosefe reviewed Jan 11, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

src/uct/cuda/cuda_copy/cuda_copy_md.c Show resolved Hide resolved

yosefe reviewed Jan 13, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Show resolved Hide resolved

Akshay-Venkatesh reviewed Jan 13, 2025

View reviewed changes

yosefe reviewed Jan 14, 2025

View reviewed changes

yosefe approved these changes Jan 15, 2025

View reviewed changes

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

10bf49e

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from 68e1fc5 to 10bf49e Compare January 15, 2025 16:08

yosefe merged commit c78164d into openucx:master Jan 16, 2025
146 checks passed

tvegas1 mentioned this pull request Jan 16, 2025

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM - v1.18.x #10422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

tvegas1 commented Dec 20, 2024 •

edited

Loading

yosefe commented Dec 20, 2024

tvegas1 commented Jan 6, 2025

tvegas1 commented Jan 6, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

tvegas1 Jan 7, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

brminich commented Jan 9, 2025

yosefe Jan 9, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

Akshay-Venkatesh Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025 •

edited

Loading

tvegas1 Jan 13, 2025

tvegas1 Jan 13, 2025

tvegas1 Jan 14, 2025

yosefe Jan 14, 2025 •

edited

Loading

tvegas1 Jan 14, 2025

tvegas1 Jan 14, 2025

yosefe Jan 15, 2025

tvegas1 Jan 15, 2025

yosefe commented Jan 16, 2025

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

Conversation

tvegas1 commented Dec 20, 2024 • edited Loading

What?

Why?

How?

Testing

yosefe commented Dec 20, 2024

tvegas1 commented Jan 6, 2025

tvegas1 commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brminich commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvegas1 Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yosefe Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yosefe commented Jan 16, 2025

tvegas1 commented Dec 20, 2024 •

edited

Loading

tvegas1 Jan 13, 2025 •

edited

Loading

yosefe Jan 14, 2025 •

edited

Loading