Honors Computer Science Bachelor's Thesis by Emerson Ford at the University of Utah.
ansiblean ansible playbook to setup hosts for testing various RDMA-in-container solutions.bg_presentationbackground presentation on the topic of this thesisdataraw data gathered for each RDMA-in-container solution testFreeflowsubmodule to a fork of Freeflow, which required a few alterations to get working and includes some QoL changes to make testing fasterpaperactual document in LaTeX for this thesis
This assumes you're using Cloudlab and have SSH keys configured on your Cloudlab account. You should also have ansible-playbook and Python 3.10 installed.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515.
- Change "Node type to use" to
- After the hosts have booted, upgrade them to Ubuntu 21.10. This is pretty much just running
sudo do-release-upgradeon the hosts. - Change the two hostnames of the
no_mlnx_ofedgroup to your Cloudlab hostnames inansible/hosts.yaml. Changeansible_userundervarsto your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit no_mlnx_ofed site.ymlwhile in theansibledirectory. - Run the commands listed in
data/softroce_*/metadata*while in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary.- Run the host (i.e. non-SoftRoCE) versions of the test first.
- Then, for running the SoftRoCE version of
run_basic_testsandrun_cpu_tests, a SoftRoCE NIC must be manually configured before running the tests. SSH into both hosts and runsudo modprobe -rv mlx5_ib && sudo reboot now. After they reboot, runsudo rdma link | grep rxe0 || sudo rdma link add rxe0 type rxe netdev enp65s0f0np0 && sudo devlink dev param set pci/0000:41:00.0 name enable_roce value false cmode driverinit && sudo devlink dev reload pci/0000:41:00.0. Then run the test commands.
- The data should appear in
data/raw. You can generate graphs based on the data you just produced by settingMODE = "softroce"and rerunning all cells in the*.ipynbJupyter notebooks (runjupyter-notebookwhile in thedatadir).
- Ubuntu 21.10 / Linux Kernel >5.13 is required to avoid certain kernel panics when using SoftRoCE.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5group to your Cloudlab hostnames inansible/hosts.yaml. Changeansible_userundervars:to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 site.ymlwhile in theansibledirectory. - Run the commands listed in
data/shared_hca_*/metadata*while in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary.- For running the Shared HCA version of
run_basic_testsandrun_cpu_tests, a Docker macvlan network must first be created withdocker network ls | grep mynet || docker network create -d macvlan --subnet=192.168.1.0/24 -o parent=ens3f0 -o macvlan_mode=private mynet.
- For running the Shared HCA version of
- The data should appear in
data/raw. You can generate graphs based on the data you just produced by settingMODE = "shared_hca"and rerunning all cells in the*.ipynbJupyter notebooks (runjupyter-notebookwhile in thedatadir).
- The RDMA GID table is namespaced inside of the container, thus the majority of GID entries are
0000:0000:0000:0000:0000:0000:0000:0000, and the ones namespaced into the container's namespace are populated. Despite this,ib_[read|write|send]_[bw|lat]do not select the proper GID and will error withFailed to modify QP XXXX to RTRandUnable to Connect the HCA's through the link. You can forceib_[read|write|send]_[bw|lat]to use the correct GID entry with the-xflag. See/sys/class/infiniband/*/ports/*/gidsfor GID entry values and/sys/class/infiniband/*/ports/*/gid_attrs/typesfor the corresponding type (RoCE v1, RoCE v2, etc).- You can use
rdma_cmqueue pairs to avoid this with the-Rflag. However, using RDMA connection manager queue pairs results in 100% CPU utilization on theib_[read|write|send]_[bw|lat]server (which should have around a 0% CPU util for read/write operations), thus their use can result in incorrect CPU usage readings.
- You can use
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5group to your Cloudlab hostnames inansible/hosts.yaml. Changeansible_userundervars:to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 site.ymlwhile in theansibledirectory. - Run the commands listed in
data/sriov_basic_tests/metadata_hostanddata/sriov_cpu_tests/metadata_hostwhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov site.ymlwhile in theansibledirectory. This will provision the first SRIOV virtual function on both hosts. - Run the commands listed in
data/sriov_basic_tests/metadata_sriovanddata/sriov_cpu_tests/metadata_sriovwhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - Run the commands listed in
data/sriov_multi_dev/metadatawhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw. You can generate graphs based on the data you just produced by settingMODE = "sriov"and rerunning all cells in the*.ipynbJupyter notebooks (runjupyter-notebookwhile in thedatadir).
- SRIOV virtual function instantiation is really finicky. Sometimes it behaves and sometimes it doesn't. If your
basic_testsorcpu_testsdon't work, reboot the host and rerunansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov site.yml. Then rerun your tests.- The
multi_sriov_tests.pyscript tries to handle the finickiness of SRIOV virtual functions, but after >20 times to get them to cooperate, it will fail the test.
- The
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
c6220.
- Change "Node type to use" to
- Change the two hostnames of the
connectx3group to your Cloudlab hostnames inansible/hosts.yaml. Changeansible_userundervars:to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx3 site.ymlwhile in theansibledirectory. - Run the commands listed in
data/freeflow_*/metadata*while in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw. You can generate graphs based on the data you just produced by settingMODE = "freeflow"and rerunning all cells in the*.ipynbJupyter notebooks (runjupyter-notebookwhile in thedatadir).
- RDMA's rkey generation is deterministic (see the ReDMArk paper), particularly on mlx4 NICs. Freeflow assumes unique rkeys per host as part of its rkey mapping scheme, which breaks with this deterministic generation. I added a patch to my fork of Freeflow to circumvent this, but if you run into
Failed status 10: wr_id 0 syndrom 0x88errors, this is likely why. - Freeflow expects page-aligned memory, hence the use of
LD_PRELOAD=./align_malloc.so. - Freeflow only supports mlx4 driver NICs, so you must use ConnectX-3 NICs.
- Freeflow provides a "no-fastpath" mode. However, this mode is prone to deadlocks at specific RDMA packet sizes and with more than 2 clients.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
c6525-100g.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5group to your Cloudlab hostnames inansible/hosts.yaml. Changeansible_userundervars:to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct site.ymlwhile in theansibledirectory. Reboot both hosts through the Cloudlab UI after installing MLNX OFED. Then rerun theansible-playbookcommand to completion. - Run the commands listed in
data/asap2_direct_basic_tests/metadata_hostanddata/asap2_direct_cpu_tests/metadata_hostwhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct -e "configure_sriov_ifs=false" site.ymlwhile in theansibledirectory. This will provision the first SRIOV virtual function on both hosts and configure ASAP2 Direct. - Run the commands listed in
data/asap2_direct_basic_tests/metadata_sriov_ovsanddata/asap2_direct_cpu_tests/metadata_sriov_ovswhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct -e "NUM_OF_VFS=32" -e "configure_sriov_ifs=false" -e "cleanup_old_state=true" site.ymlwhile in theansibledirectory. This will provision multiple VFs for the multi dev test. - Run the commands listed in
data/asap2_direct_multi_dev/metadatawhile in thetest_scriptsdirectory. Take care to replace the--host1and--host2flags to match your Cloudlab hostnames, and--userto match your Cloudlab username. The first argument (/opt/homebrew/.../Python) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw. You can generate graphs based on the data you just produced by settingMODE = "asap2_direct"and rerunning all cells in the*.ipynbJupyter notebooks (runjupyter-notebookwhile in thedatadir).
- When using
switchdev, there's both an interface for the SRIOV NIC itself and a "representor netdevice" (see these slides). Sometimes the names get messed up on these and you have to reboot the host or mess around with udev rules.