-
Notifications
You must be signed in to change notification settings - Fork 488
UCT/GDA: Use wqe_idx for rc_gda progress #10928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aadb4d2
to
fa5d4d4
Compare
if (req != nullptr) { | ||
comp = &req->comp; | ||
req->device_ep = device_ep; | ||
uct_device_completion_init(comp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also remove the TODO comment
src/uct/api/device/uct_device_impl.h
Outdated
#include <uct/ib/mlx5/gdaki/gdaki.cuh> | ||
|
||
union uct_device_completion { | ||
uct_rc_gda_completion_t rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rc -> rc_gda
src/uct/ib/mlx5/gdaki/gdaki.cuh
Outdated
0xffffff); | ||
uct_rc_mlx5_gda_qedump("WQE", wqe_ptr, 64); | ||
uct_rc_mlx5_gda_qedump("CQE", cqe64, 64); | ||
ep->wqe_error = ep->sq_wqe_pi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need some "atomic write" so it will be visible to all threads?
src/uct/ib/mlx5/gdaki/gdaki.cuh
Outdated
if (ep->wqe_error && comp->wqe_idx < ep->wqe_error) { | ||
return UCS_ERR_IO_ERROR; | ||
} | ||
|
||
status = (ucs_status_t)__shfl_sync(0xffffffff, status, 0); | ||
__syncwarp(); | ||
return status; | ||
} else if (level == UCS_DEVICE_LEVEL_THREAD) { | ||
return uct_rc_mlx5_gda_progress_thread(ep); | ||
} else { | ||
return UCS_ERR_UNSUPPORTED; | ||
if (ep->sq_wqe_pi <= comp->wqe_idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why need to check ep->wqe_error separately?
IMOcomp->wqe_idx < ep->wqe_error
is enough - do we need some kind of atomic read/write for ep->wqe_error?
- maybe rename ep->wqe_error to ep->wqe_error_pi?
- if another thread updated ep->sq_wqe_pi but didn't yet update ep->wqe_error, we might think some tl_comp is success, while in fact it is failure.
src/uct/ib/mlx5/gdaki/gdaki_dev.h
Outdated
uint64_t cqe_ci; | ||
int sq_lock; | ||
|
||
uint64_t wqe_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move it after sq_wqe_pi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with many threads, no hangs and perf is better than before:
-O 32 -T 1 7.7us
-O 32 -T 32 159us
-O 32 -T 64 153us
-O 16 -T 128 372us
-O 8 -T 256 1169us
-O 4 -T 512 1768us
This PR was hanging in cuda kernel perftest. So may be there is a real issue. |
No description provided.