-
Notifications
You must be signed in to change notification settings - Fork 488
UCP/API: Work with offsets and channel id. #10920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/API: Work with offsets and channel id. #10920
Conversation
src/ucp/api/device/ucp_device_impl.h
Outdated
{ | ||
return ucs_device_atomic64_write( | ||
reinterpret_cast<uint64_t*>(counter_ptr), value); | ||
return ucs_device_atomic64_write(reinterpret_cast<uint64_t*>(counter_ptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is void function, return is not needed
UCP_DEVICE_MEM_LIST_ELEM_FIELD_LENGTH; | ||
elems[i].memh = perf.ucp.send_memh; | ||
elems[i].rkey = perf.ucp.rkey; | ||
elems[i].local_addr = (char*)perf.send_buffer + offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: UCS_PTR_BYTE_OFFSET
uct_elem = (uct_device_mem_element_t*)UCS_PTR_BYTE_OFFSET(mem_list_h + 1, | ||
elem_offset); | ||
uct_elem = (uct_device_mem_element_t*) | ||
UCS_PTR_BYTE_OFFSET(mem_list_h->uct_mem_elements, elem_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe save uct_mem_elements per lane, and remove the field uct_mem_element_size ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, how we can calculate offset to specific element without saving the size? type of element is unknown to UCP.
BTW: to support multi lanes we do need to add uct_mem_elements
per lane. Not sure I will do it in this PR, it's already large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm right
src/ucp/api/device/ucp_device_impl.h
Outdated
ucp_device_mem_list_handle_h mem_list_h, unsigned mem_list_index, | ||
const void *address, uint64_t remote_address, size_t length, | ||
uint64_t flags, ucp_device_request_t *req) | ||
ucp_device_mem_list_handle_h mem_list_h, unsigned channel_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO channel_id param should be between "length" and "flags"
/** | ||
* Array of local addresses for the device transfer operations. | ||
*/ | ||
void **local_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_addrs
/** | ||
* Array of remote addresses for the device transfer operations. | ||
*/ | ||
uint64_t *remote_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_addrs
* Array of lengths of the local buffers in bytes. | ||
*/ | ||
size_t *length; | ||
} ucp_mem_elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we can remove this wrapping struct
src/ucp/core/ucp_device.c
Outdated
|
||
handle_size += sizeof(*handle.ucp_mem_elements.local_addr); | ||
handle_size += sizeof(*handle.ucp_mem_elements.remote_addr); | ||
handle_size += sizeof(*handle.ucp_mem_elements.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- space line after
- maybe combine to single calculation
ucp_mem_type_unpack(ep->worker, &local_addresses[i], &local_addr, | ||
sizeof(local_addresses[i]), mem_type); | ||
ucp_mem_type_unpack(ep->worker, &remote_addresses[i], &remote_addr, | ||
sizeof(remote_addresses[i]), mem_type); | ||
ucp_mem_type_unpack(ep->worker, &lengths[i], &length, | ||
sizeof(lengths[i]), mem_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we "unpack" all these in single/several calls, and not for each element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_addrs array follow local_addrs array, so only way I see to do it not for each element is to allocate some tmp array, copy to it and then use unpack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, can be next pr
elems[i].local_addr = (char*)perf.send_buffer + offset; | ||
elems[i].remote_addr = perf.ucp.remote_addr + offset; | ||
elems[i].length = (i == count - 1) ? ONESIDED_SIGNAL_SIZE : | ||
perf.params.msg_size_list[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next pr: align
local_offsets[i] = 0; | ||
remote_offsets[i] = 0; | ||
lengths[i] = (i == count - 1) ? ONESIDED_SIGNAL_SIZE : | ||
perf.params.msg_size_list[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next pr: align
|
||
void init_counters(const ucx_perf_context_t &perf) | ||
{ | ||
m_params.length = ucx_perf_get_message_size(&perf.params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next pr: align
src/ucp/api/device/ucp_device_impl.h
Outdated
* for the memory transfer. The addresses and length must be valid for the used | ||
* @a mem_list entry. | ||
* for the memory transfer. The @a local_offset and @a remote_offset parameters | ||
* specify byte offsets within the selected memory list entry.The @a length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* specify byte offsets within the selected memory list entry.The @a length, | |
* specify byte offsets within the selected memory list entry. The @a length, |
src/ucp/api/device/ucp_device_impl.h
Outdated
* @tparam level Level of cooperation of the transfer. | ||
* @param [in] req Request containing operations in progress. | ||
* | ||
* @param [in] channel_id Channel ID to progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next pr: align
/** | ||
* Array of local addresses for the device transfer operations. | ||
*/ | ||
void **local_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be more logical and cache friendly to have an array of struct?
So that the fields of certain element are stored together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, it will also help to unpack the fields when creating the handle.
But it will require changing UCT API as well...(it expects array of local & remote addresses). I didn't want to do it now because it's urgent.
What?
Change UCP device API to work with offsets instead of address.
Add channel_id to API to support use of multiple QPs per destination.
Why?
User requirements.