Skip to content

NVMe oF

Krijn Doekemeijer edited this page Aug 23, 2023 · 1 revision

NVMe over fabrics

Setup NVMe-oF target (Kernel)

Enable necessary modules:

sudo modprobe nvmet
sudo modprobe nvme-tcp # If TCP
sudo modprobe nvmet-tcp # If TCP
sudo modprobe nvmet-rdma # If RDMA
sudo modprobe nvme-fabrics
# For Mellanox
modprobe mlx5_core
modprobe mlx5_ib

Create subsystem:

target_nqn=... # Use a memorable name
sudo mkdir /sys/kernel/config/nvmet/subsystems/$target_nqn
cd /sys/kernel/config/nvmet/subsystems/$target_nqn
echo 1 | sudo tee -a attr_allow_any_host > /dev/null 
# Do the following for each namespace of the NVMe device (change the 1 and the nvmexny)
sudo mkdir namespaces/1
cd namespaces/1
sudo echo -n /dev/nvmexny |sudo tee -a device_path > /dev/null
echo 1 | sudo tee -a enable > /dev/null

Create port (if TCP):

sudo mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
# Get an IP from the node (e.g. with `ip a`)
echo $IP | sudo tee -a addr_traddr > /dev/null
echo tcp | sudo tee -a addr_trtype > /dev/null
echo 4420 | sudo tee -a addr_trsvcid > /dev/null # 4420 is a consensus?
echo ipv4 | sudo tee -a addr_adrfam > /dev/null # Change to ipv6 if necessary
sudo ln -s /sys/kernel/config/nvmet/subsystems/$target_nqn/ /sys/kernel/config/nvmet/ports/1/subsystems/$target_nqn

Check NICs capable of RDMA:

ls /sys/class/infiniband/*/device/net # See https://spdk.io/doc/nvmf.html

Create port (if RDMA):

sudo mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
# Get an IP from the node that is CAPABLE of RDMA (e.g. with `ip a | grep ep...`)
echo $IP | sudo tee -a addr_traddr > /dev/null
echo rdma | sudo tee -a addr_trtype > /dev/null
echo 4420 | sudo tee -a addr_trsvcid > /dev/null # 4420 is a consensus?
echo ipv4 | sudo tee -a addr_adrfam > /dev/null # Change to ipv6 if necessary
sudo ln -s /sys/kernel/config/nvmet/subsystems/$target_nqn/ /sys/kernel/config/nvmet/ports/1/subsystems/$target_nqn

Verify if a success:

sudo dmesg | grep "nvmet"

If it fails verify parameters in /port and if RDMA check if the NIC supports RDMA. You might need to load more modules.

Setup NVMe-oF target (SPDK BDev)

SPDK can create a NVMe-oF capable block device (if built with (RDMA support). This hides away device internals. For example ZNS is abstracted away as a block device. Here we note down how to create such a device. Also see https://spdk.io/doc/nvmf.html. First enable SPDK to use the device:

cat /sys/block/$nvmedev/device/address # < Note down as traddr
cd $spdk_dir
sudo PCI_ALLOWED=$traddr ./scripts/setup.sh

Then create the target:

./build/bin/nvmf_tgt &
./scripts/rpc.py bdev_nvme_attach_controller \
   -b "Nvmex" `#Pick a proper NVMe name, like the original device name` \
   -t "pcie"  \
   -a $traddr   # traddr of the device
./scripts/rpc.py nvmf_create_transport \
    -t rdma `# Or TCP` \
    -u 8192   # See --help, sets capsule data size
./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK1
# Do the following foreach namespace of the device (Change the x and y)
./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Nvmexny

Setup listener

./scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 \
  -t tcp `# Or rdma` \
  -a $IP `# Get an IP from the node to use` \
  -s 4420 # port

Setup NVMe-oF initiator/host (Kernel)

Setup modules (Can be on a different machine):

sudo modprobe nvme
sudo modprobe nvmet
sudo modprobe nvme-tcp
sudo modprobe nvmet-tcp
sudo modprobe nvme-rdma
sudo modprobe nvme-fabrics

Find device:

sudo nvme discover \
  -t tcp `# Change to rdma if rdma` \
  -a $IP `# Use IP of target, check if it can be accessed with ping` \
  -s 4420 # Use port of target

Connect device:

sudo nvme connect \
  -t tcp `# Change to rdma if rdma` \
  -a $IP `# Use IP of target, check if it can be accessed with ping` \
  -s 4420 `# Use port of target` \
  -n $target_nqn # Use NQN of target device (Can also be seen with discover)

Verify:

sudo dmesg | tail -n 100
sudo nvme list | grep $test_nqn
# If ZNS also try
sudo nvme zns list | grep $test_nqn

Disconnect:

sudo nvme disconnect -n $test_nqn

SPDK Fio NVMe-of

Unlike for NVMe devices you do not need to use the ./scripts/setup.sh from the SPDK dir. Instead for fio (and other tools) use a different fileformat during your tests with the IP address:

filename='trtype=tcp adrfam=IPv4 traddr=$IP trsvcid=4200 ns=1' 
# Explanation
# trtype=tcp # Or rdma
# adrfam=IPv4 
# traddr=$IP # IP of target
# trsvcid=4200 # Port
# ns=1 # Namespace, always last!

Bugs