-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When testing ROCm D2D transfers with UCX_TLS=rc, how does setting UCX_IB_GPU_DIRECT_RDMA=0 affect the osu_bw test results? #10077
Comments
rc transports can use GPU direct RDMA feature. |
You’re right, but what puzzles me is that when I set UCX_IB_GPU_DIRECT_RDMA=0, my test results are the same as when UCX_IB_GPU_DIRECT_RDMA=1. Do you know why this happens?
|
I would not set |
I am not entirely sure what generation of IB hardware you are using, but the bandwidth values that you show are very low, most likely data is funneled through the CPU memory in your case. I would recommend a) try first only one HCA at a time (ideally the one closest to the GPU that you are using), b) double check that acs is disabled on your system, since that might prevent direct GPU to HCA communication. You should not have to worry about the IB_GPU_DIRECT_RDMA setting, we usually don't set that value in order to achieve full line BW. |
Also, are you using the a Mellanox OFED driver on your system, or the standard Linux RMDA packages? I would recommend MOFED for easier interactions with the GPUs |
When using UCX_TLS=rc to test ROCm D2D transfers, setting UCX_IB_GPU_DIRECT_RDMA=0 doesn't affect the osu_bw test results. Is this because rc doesn't use GPUDirect RDMA technology, or is it because GPUDirect RDMA is enabled by default when using rc?
The text was updated successfully, but these errors were encountered: