Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CUDA/CUDA_COPY: Enabled memory attributes query after switching CUDA GPU. #10388

Merged
merged 1 commit into from
Mar 1, 2025

Conversation

rakhmets
Copy link
Contributor

@rakhmets rakhmets commented Dec 17, 2024

What?

Enabled memory attributes query by cuda_cpy memory domain after switching CUDA GPU.
Added test.

Without the changes in cuda_cpy memory domain the test fails with the following error:

cuda_copy_md.c:649  UCX  ERROR cuMemGetAddressRange(0x7f8cd9e00000) error: named symbol not found

@rakhmets rakhmets force-pushed the topic/gtest-switch-gpu branch from d31bb61 to 1adcb5e Compare December 17, 2024 18:09
@rakhmets rakhmets marked this pull request as ready for review January 15, 2025 17:50
@brminich
Copy link
Contributor

brminich commented Feb 7, 2025

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rakhmets rakhmets force-pushed the topic/gtest-switch-gpu branch from 4adf57b to 6d38f73 Compare February 11, 2025 19:36
@rakhmets rakhmets requested review from tvegas1 and brminich February 12, 2025 11:33
@rakhmets rakhmets changed the title TEST/GTEST: Added cuda gpu switching testing. UCT/CUDA/CUDA_COPY: Enabled switching CUDA GPUs between memory allocation and memory mapping. Feb 12, 2025
brminich
brminich previously approved these changes Feb 12, 2025
@rakhmets rakhmets added the WIP-DNM Work in progress / Do not review label Feb 13, 2025
@rakhmets rakhmets removed the WIP-DNM Work in progress / Do not review label Feb 13, 2025
@rakhmets rakhmets changed the title UCT/CUDA/CUDA_COPY: Enabled switching CUDA GPUs between memory allocation and memory mapping. UCT/CUDA/CUDA_COPY: Enabled memory attributes query after switching CUDA GPU. Feb 13, 2025
brminich
brminich previously approved these changes Feb 13, 2025
tvegas1
tvegas1 previously approved these changes Feb 13, 2025
iyastreb
iyastreb previously approved these changes Feb 14, 2025
@rakhmets rakhmets dismissed stale reviews from iyastreb, tvegas1, and brminich via 20bdcbb February 14, 2025 11:22
iyastreb
iyastreb previously approved these changes Feb 14, 2025
brminich
brminich previously approved these changes Feb 14, 2025
tvegas1
tvegas1 previously approved these changes Feb 14, 2025
@rakhmets rakhmets dismissed stale reviews from tvegas1, brminich, and iyastreb via 578d42a February 14, 2025 16:46
Comment on lines 52 to 57
uct_md_mem_attr_t mem_attr = {};
mem_attr.field_mask = UCT_MD_MEM_ATTR_FIELD_MEM_TYPE;
EXPECT_EQ(uct_md_mem_query(m_md.get(), mem.address, size, &mem_attr),
UCS_OK);
EXPECT_EQ(mem_attr.mem_type, UCS_MEMORY_TYPE_CUDA);
EXPECT_EQ(uct_mem_free(&mem), UCS_OK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we check different allocation/memory types like VMM, Managed, etc? Can do it using several test functions (cases)
I would allocate memory here using Cuda API directly and not using UCT to test these cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for different memory types.


void cuda_fabric_mem_buffer::destroy()
{
switch (m_state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need to save the state as member var: when cuXX api fails during the constructor, rollback the state (using goto) and skip the test after the rollback. then can probably remove init/destroy wrappers and catching of skip exception

brminich
brminich previously approved these changes Feb 26, 2025
yosefe
yosefe previously approved these changes Feb 26, 2025
@rakhmets rakhmets dismissed stale reviews from yosefe and brminich via 112596d February 26, 2025 10:54
tvegas1
tvegas1 previously approved these changes Feb 26, 2025
@rakhmets
Copy link
Contributor Author

@tvegas1 addressed your comment. Deleted by mistake.

@rakhmets rakhmets force-pushed the topic/gtest-switch-gpu branch from aa78963 to 39f917a Compare February 27, 2025 16:27
@rakhmets rakhmets merged commit 0fc6685 into openucx:master Mar 1, 2025
151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants