Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/GGA: basic iface and ep infrastructure #9845

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

evgeny-leksikov
Copy link
Contributor

What

Implement and test basic GGA iface and ep infrastructure

Why ?

Continue #9811

@yosefe
Copy link
Contributor

yosefe commented May 2, 2024

@iyastreb can you pls review?

@@ -72,6 +72,7 @@ enum {
UCT_IB_MLX5_CMD_OP_2ERR_QP = 0x507,
UCT_IB_MLX5_CMD_OP_2RST_QP = 0x50a,
UCT_IB_MLX5_CMD_OP_QUERY_QP = 0x50b,
UCT_IB_MLX5_CMD_OP_INIT2INIT_QP = 0x50e,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes covered by unit/integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is part of GGA ep_create flow

@@ -29,6 +29,46 @@ typedef struct uct_gga_mlx5_iface_config {
uct_rc_mlx5_iface_common_config_t rc_mlx5_common;
} uct_gga_mlx5_iface_config_t;

typedef struct uct_gga_mlx5_dma_opaque {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: here and below, struct name is redundant, typedef should be sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is "UCX code style"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, in many places in UCX I see only typedef, but up to you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need names for internal structs (unless it's a forward declaration)

src/uct/ib/rc/accel/rc_mlx5_common.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/rc_mlx5_ep.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
return UCS_ERR_NO_MEMORY;
}

ep->dma_opaque.mr = ibv_reg_mr(md->super.pd, ep->dma_opaque.buf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reuse here uct_ib_reg_mr?
uct_ib_reg_mr handles EAGAIN error by repeating the call, should it be the similar in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reuse here uct_ib_reg_mr?

IMO it's overcomplicated in this case

uct_ib_reg_mr handles EAGAIN error by repeating the call, should it be the similar in this case?

no, that's only for ODP case

src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
iface_attr->device_addr_len += ucs_offsetof(uct_gga_mlx5_dev_addr_t,
ib_addr);
iface_attr->ep_addr_len = sizeof(uct_gga_mlx5_ep_address_t);
iface_attr->cap.flags = /*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, placeholder for a next patch

uct_rc_ep_cleanup_qp(&self->super.super, &cleanup_ctx->super,
self->super.tx.wq.super.qp_num,
outstanding - wqe_count);
cleanup_ctx = ucs_malloc(sizeof(*cleanup_ctx), "mlx5_qp_cleanup_ctx");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to allocate memory during cleanup. Where this memory is released?
Maybe allocation on stack would suffice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is async cleanup context, see IBV_EVENT_QP_LAST_WQE_REACHED

uct_gga_mlx5_iface_qp_cleanup_ctx_t *gga_ctx =
ucs_derived_of(ctx, uct_gga_mlx5_iface_qp_cleanup_ctx_t);

ibv_dereg_mr(gga_ctx->dma_opaque.mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe uct_ib_dereg_mr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
iyastreb
iyastreb previously approved these changes May 3, 2024
@@ -29,6 +29,46 @@ typedef struct uct_gga_mlx5_iface_config {
uct_rc_mlx5_iface_common_config_t rc_mlx5_common;
} uct_gga_mlx5_iface_config_t;

typedef struct uct_gga_mlx5_dma_opaque {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need names for internal structs (unless it's a forward declaration)

src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/gga_mlx5.c Outdated Show resolved Hide resolved
src/uct/ib/rc/accel/rc_mlx5_iface.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_iface.c Outdated Show resolved Hide resolved
UCT_IB_MLX5DV_SET(qpc, qpc, rq_type, !!attr->super.srq_num);
UCT_IB_MLX5DV_SET(qpc, qpc, rq_type,
attr->super.srq_num ? 1 /* SRQ */ :
max_rx ? 0 /* RQ */ :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we have a case of "RQ?


uct_ib_pack_uint24(gga_addr->qp_num, ep->super.tx.wq.super.qp_num);
if (uct_rc_iface_flush_rkey_enabled(&iface->super)) {
gga_addr->flags = UCT_GGA_MLX5_EP_ADDRESS_FLAG_FLUSH_RKEY;
gga_addr->flush_rkey = (md->flush_rkey >> 16);
gga_addr->flush_rkey = md->flush_rkey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush_rkey in the address could be 24 bit since its lower 8 bits are always 0
see

ucs_assert((md->super.flush_rkey & 0xff) == 0);

Comment on lines 310 to 311
iface = ucs_derived_of(tl_iface, uct_base_iface_t);
md = ucs_derived_of(iface->md, uct_ib_mlx5_md_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init inline

@@ -718,6 +718,10 @@ uct_ib_iface_roce_dscp(uct_ib_iface_t *iface)
return iface->config.traffic_class >> 2;
}

int uct_ib_iface_dev_addr_is_reachable(uct_ib_iface_t *iface,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move this along with other function prototypes in this file

}

static ucs_status_t
uct_gga_mlx5_iface_init_rx(uct_rc_iface_t *rc_iface,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this wrapper?

return UCS_OK;
}

static void uct_gga_mlx5_iface_qp_cleanup(uct_rc_iface_qp_cleanup_ctx_t *ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need cleanup mechanism for GGA. Cleanup is about waiting for LAST_WQE_REACHED event, which is not relevant since we don't have SEND/RECV operations. I'd suggest to not add it in this PR, unless we see some failure. Can test error handling and add it later if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoid LAST_WQE_REACHED flow adding async parameter which is false for GGA, this approach adds some overhead on ep_destroy but re-uses dtors of base classed without significant change, otherwise need to refactor them deeper

@@ -210,7 +210,8 @@ void uct_rc_mlx5_iface_common_prepost_recvs(uct_rc_mlx5_iface_common_t *iface)
{
/* prepost recvs only if quota available (recvs were not preposted
* before) */
if (iface->super.rx.srq.quota == 0) {
if ((iface->super.rx.srq.quota == 0) ||
(iface->rx.srq.type == UCT_IB_MLX5_OBJ_TYPE_NULL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the check iface->super.rx.srq.quota == 0 is not enough? can we set quota to 0 for gga?

@@ -526,6 +527,7 @@ void uct_rc_mlx5_iface_fill_attr(uct_rc_mlx5_iface_common_t *iface,
srq->verbs.srq);
break;
case UCT_IB_MLX5_OBJ_TYPE_DEVX:
case UCT_IB_MLX5_OBJ_TYPE_NULL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why UCT_IB_MLX5_OBJ_TYPE_NULL is not an empty operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I don't know why SRQ obj type defines QP attrs for TX too :)

ucs_mpool_params_reset(&mp_params);
mp_params.elem_size = init_attr->fc_req_size;
mp_params.elem_size = ucs_max(init_attr->fc_req_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because fc_req_size == 0 for GGA

@@ -237,6 +237,10 @@ test_uct_ib_with_specific_port::test_uct_ib_with_specific_port() {
}

void test_uct_ib_with_specific_port::init() {
if (has_transport("gga_mlx5")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the skip

@evgeny-leksikov
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants