-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396
Conversation
1ce967f
to
68a5f51
Compare
we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage? |
@yosefe, do we need this before release? |
I think it is difficult because we need to build with later driver version and run it with older driver version. But for instance, when I run this container on rock, we are only running later driver version, and I don't think we can easily switch driver version since it has to match kernel module as per my understanding.
|
} | ||
#else | ||
unsigned value = 1; | ||
(void)ctx_set_flags_func; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed?
maybe we could just remove #if HAVE_CUDA_FABRIC
now, since we don't use cuCtxSetFlags directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restored as it is needed by CU_CTX_SYNC_MEMOPS
{ | ||
static ucs_status_t status = UCS_ERR_LAST; | ||
|
||
#if CUDA_VERSION >= 12000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuGetProcAddress()
prototype changed at >=12000 and we know that cuCtxSetFlags()
also appeared after 12000 so no need to use older cuGetProcAddress() prototype to check.
@@ -823,6 +834,37 @@ static uct_md_ops_t md_ops = { | |||
.detect_memory_type = uct_cuda_copy_md_detect_memory_type | |||
}; | |||
|
|||
static ucs_status_t uct_cuda_copy_md_check_is_ctx_set_flags_supported(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the code, we could have this function call the needed function pointer, and move the global var inside it.
Something like
ucs_status_t uct_cuda_copy_set_ctx_flags(unsigned flags)
and have it return UCS_ERR_UNSUPPORTED if the func pointer is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but went for two step approach as we need:
- disable fabric at init time
- set the flag with
md
andaddress
as parameter, in case we cannot usecuCtxSetFlags()
7acee45
to
ff4313c
Compare
@yosefe, pls review |
} | ||
|
||
ucs_diag("disabled fabric memory allocations"); | ||
md->config.enable_fabric = UCS_NO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it affects only cuda_copy memory allocations, but what happens if we get a fabric memory from user buffer and then we don't actually set sync memops for it?
we could return UNSUPPORTED from uct_cuda_copy_sync_memops and if not - return error from cuda memory detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should show now be handled right?
da07d62
to
8657d54
Compare
CUdriverProcAddressQueryResult sym_status; | ||
CUresult cu_err; | ||
ucs_status_t status; | ||
uct_cuda_cuCtxSetFlags_t cuda_cuCtxSetFlags_func = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- initialized vars should be first
- should be static??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks missed the static
@@ -553,8 +554,7 @@ static void uct_cuda_copy_sync_memops(uct_cuda_copy_md_t *md, | |||
|
|||
if (is_vmm) { | |||
ucs_fatal("failed to set sync_memops on CUDA VMM without " | |||
"cuCtxSetFlags() (address=%p)", | |||
address); | |||
"cuCtxSetFlags() (address=%p)", address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of it again it should be a warning, since failure in cuPointerSetAttribute() call is also a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when is_vmm == 1
you want to call cuPointerSetAttribute
and let it fail right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to ucs_warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm right, actually we can return from the function after ucs_warn, and not call cuPointerSetAttribute at all
} | ||
|
||
if (is_vmm) { | ||
ucs_warn("failed to set sync_memops on CUDA VMM without " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvegas1 Current changes look good to me but @yosefe brought up an issue where library is built with >=12.3 compatible driver version but the system where that library gets used has driver version < 12.1. On such a system, VMM/Mallocasync allocations are allowed (as VMM and MallocAsync is supported on driver versions < 12.1). But there would be a need to report an error or fail even if UCX isn't compiled with HAVE_CUDA_FABRIC (driver version >= 12.3). The condition met here is when UCX is built with >=12.3 driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, that's where we have VMM independently allocated, but still we don't have HAVE_FABRIC
set, in this case I will move is_vmm
out of the #ifdef and fatal if is_vmm == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it will help - if HAVE_FABRIC is not set we will never know in UCX it is VMM memory and assume it is legacy memory. Then we can only hope that cuPointerSetAttribute would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes also enabled detect vmm because:
- cuMemRelease >= 10.2
- cuMemRetainAllocationHandle >= 11.0
- cuMemGetAllocationPropertiesFromHandle >= 10.2
assuming we built UCX with cuda >= 11 anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if you read it differently: https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Driver_API_functions_supported_by_HIP.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checked actual function prototype with cuda older release online documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure on cuda11.0, seems although cuMemRetainAllocationHandle
is found in headers in /usr/local/cuda-11/include/, it is not availalbe in the link time cuda stub library.
@@ -469,6 +470,7 @@ static int uct_cuda_copy_detect_vmm(void *address, | |||
ucs_memory_type_t *vmm_mem_type, | |||
CUdevice *cuda_device) | |||
{ | |||
#if HAVE_CUMEMRETAINALLOCATIONHANDLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : usually prefer ifdef over if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually restored because to if's, seems it's causing issue when #define HAVE_DECL_CU_MEM_LOCATION_TYPE_HOST 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do it only for HAVE_CUMEMRETAINALLOCATIONHANDLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
68e1fc5
to
10bf49e
Compare
@tvegas1 can you pls port to v1.18.x |
What?
Do not use
cuCtxSetFlags()
if CUDA driver does not support it.Why?
Unresolved symbol for
cuCtxSetFlags
on CUDA driver < 12.1 causes crash.How?
Assumptions:
cuCtxSetFlags
is only needed for VMM, which has UCX support starting from CUDA driver >= 12.3cuCtxSetFlags
is not strictly needed for malloc asyncTesting
Locally tested, needs final testing on platform with actual older drivers.