-
Notifications
You must be signed in to change notification settings - Fork 30
Description
libpsm2 looks for sysfs entries under the path /sys/class/infiniband/hfi1_x. With rdma-core v24.0, the device is renamed according to its device type, PCI bus and device, a la "predictable interface names". This is described at https://patchwork.kernel.org/cover/10870443/ .
On my host, the sysfs path for hfi1_0 is /sys/class/infiniband/opap129s. Thus, libpsm2 fails to find the hfi1_0 sysfs entry in hfi_sysfs_port_open.
The behavior can be observed by executing fi_info on a Debian sid/bullseye host with libfabric-bin and libpsm2-2 installed. The psm2 providers will not be listed in the output. Debug output indicates that no active psm2 device is found.
$ FI_LOG_LEVEL=debug fi_info
...
libfabric:psm2:core:psmx2_init_lib():236<info> PSM2 header version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():238<info> PSM2 library version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():241<info> PSM2 multi-ep feature enabled.
libfabric:psm2:core:psmx2_update_hfi_info():338<warn> Failed to read number of free contexts from HFI unit 0
libfabric:psm2:core:psmx2_update_hfi_info():379<info> hfi1 units: total 1, active 0; hfi1 contexts: total 0, free 0
libfabric:psm2:core:psmx2_update_hfi_info():390<info> Tx/Rx contexts: 0 in total, 0 available.
libfabric:psm2:core:psmx2_getinfo():436<info> no PSM2 device is active.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm2 returned -61 (No data available)
...
I have found two orthogonal workarounds for this problem:
- Use HFI_SYSFS_PATH e.g.
HFI_SYSFS_PATH=/sys/class/infiniband/opap129s fi_info. The "129" portion of the HFI_SYSFS_PATH value needs to be set according to the PCI bus of the HFI card. - Or, modify /lib/udev/rules.d/60-rdma-persistent-naming.rules to contain
ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_KERNEL"
While there is a workaround, libpsm2 should address the new, default RDMA device naming scheme. opa_sysfs.c:sysfs_init() looks like the place to start.