Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -1384,6 +1384,7 @@
])

# DPDK support
enable_dpdk_bond_pmd="no"
AC_ARG_ENABLE(dpdk,
AS_HELP_STRING([--enable-dpdk], [Enable DPDK support [default=no]]),
[enable_dpdk=$enableval],[enable_dpdk=no])
Expand Down Expand Up @@ -1415,6 +1416,23 @@
fi
CFLAGS="${CFLAGS} `pkg-config --cflags libdpdk`"
LIBS="${LIBS} -Wl,-R,`pkg-config --libs-only-L libdpdk | cut -c 3-` -lnuma `pkg-config --libs libdpdk`"

if test ! -z "$(ldconfig -p | grep librte_net_bond)"; then
AC_DEFINE([HAVE_DPDK_BOND],[1],(DPDK Bond PMD support enabled))
enable_dpdk_bond_pmd="yes"
LIBS="${LIBS} -lrte_net_bond" # 20.11+
elif test ! -z "$(ldconfig -p | grep librte_pmd_bond)"; then
AC_DEFINE([HAVE_DPDK_BOND],[1],(DPDK Bond PMD support enabled))
enable_dpdk_bond_pmd="yes"
LIBS="${LIBS} -lrte_pmd_bond"
else
echo
echo " WARNING: DPDK Bond PMD was not found on your system, "
echo " you will be unable to use DPDK Bond PMD."
echo " You can try to \"sudo ldconfig\" and reconfigure again"
echo " or compile and install DPDK with Bond support enabled."
echo
fi
])

# Netmap support
Expand Down Expand Up @@ -2629,6 +2647,7 @@ SURICATA_BUILD_CONF="Suricata Configuration:
Profiling rules enabled: ${enable_profiling_rules}

Plugin support (experimental): ${plugin_support}
DPDK Bond PMD: ${enable_dpdk_bond_pmd}

Development settings:
Coccinelle / spatch: ${enable_coccinelle}
Expand Down
97 changes: 97 additions & 0 deletions doc/userguide/capture-hardware/dpdk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
.. _dpdk:

DPDK
====

Introduction
-------------

The Data Plane Development Kit (DPDK) is a set of libraries and drivers that
enhance and speed up packet processing in the data plane. Its primary use is to
provide faster packet processing by bypassing the kernel network stack, which
can provide significant performance improvements. For detailed instructions on
how to setup DPDK, please refer to :doc:`../configuration/suricata-yaml` to
learn more about the basic setup for DPDK.
The following sections contain examples of how to set up DPDK and Suricata for
more obscure use-cases.

Bond interface
--------------

Link Bonding Poll Mode Driver (Bond PMD), is a software
mechanism provided by the Data Plane Development Kit (DPDK) for aggregating
multiple physical network interfaces into a single logical interface.
Bonding can be e.g. used to:

* deliver bidirectional flows of tapped interfaces to the same worker,
* establish redundancy by monitoring multiple links,
* improve network performance by load-balancing traffic across multiple links.

Bond PMD is essentially a virtual driver that manipulates with multiple
physical network interfaces. It can operate in multiple modes as described
in the `DPDK docs
<https://doc.dpdk.org/guides/prog_guide/link_bonding_poll_mode_drv_lib.html>`_
The individual bonding modes can accustom user needs.
DPDK Bond PMD has a requirement that the aggregated interfaces must be
the same device types - e.g. both physical ports run on mlx5 PMD.
Bond PMD supports multiple queues and therefore can work in workers runmode.
It should have no effect on traffic distribution of the individual ports and
flows should be distributed by physical ports according to the RSS
configuration the same way as if they would be configured independently.

As an example of Bond PMD, we can setup Suricata to monitor 2 interfaces
that receive TAP traffic from optical interfaces. This means that Suricata
receive one direction of the communication on one interface and the other
direction is received on the other interface.

::

...
dpdk:
eal-params:
proc-type: primary
vdev: 'net_bonding0,mode=0,slave=0000:04:00.0,slave=0000:04:00.1'

# DPDK capture support
# RX queues (and TX queues in IPS mode) are assigned to cores in 1:1 ratio
interfaces:
- interface: net_bonding0 # PCIe address of the NIC port
# Threading: possible values are either "auto" or number of threads
# - auto takes all cores
# in IPS mode it is required to specify the number of cores and the
# numbers on both interfaces must match
threads: 4
...

In the DPDK part of suricata.yaml we have added a new parameter to the
eal-params section for virtual devices - `vdev`.
DPDK Environment Abstraction Layer (EAL) can initialize some virtual devices
during the initialization of EAL.
In this case, EAL creates a new device of type `net_bonding`. Suffix of
`net_bonding` signifies the name of the interface (in this case the zero).
Extra arguments are passed after the device name, such as the bonding mode
(`mode=0`). This is the round-robin mode as is described in the DPDK
documentation of Bond PMD.
Members (slaves) of the `net_bonding0` interface are appended after
the bonding mode parameter.

When the device is specified within EAL parameters, it can be used within
Suricata `interfaces` list. Note that the list doesn't contain PCIe addresses
of the physical ports but instead the `net_bonding0` interface.
Threading section is also adjusted according to the items in the interfaces
list by enablign set-cpu-affinity and listing CPUs that should be used in
management and worker CPU set.

::

...
threading:
set-cpu-affinity: yes
cpu-affinity:
- management-cpu-set:
cpu: [ 0 ] # include only these CPUs in affinity settings
- receive-cpu-set:
cpu: [ 0 ] # include only these CPUs in affinity settings
- worker-cpu-set:
cpu: [ 2,4,6,8 ]
...
1 change: 1 addition & 0 deletions doc/userguide/capture-hardware/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Using Capture Hardware
ebpf-xdp
netmap
af-xdp
dpdk
2 changes: 2 additions & 0 deletions src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,7 @@ noinst_HEADERS = \
util-dpdk-i40e.h \
util-dpdk-ice.h \
util-dpdk-ixgbe.h \
util-dpdk-bonding.h \
util-ebpf.h \
util-enum.h \
util-error.h \
Expand Down Expand Up @@ -1127,6 +1128,7 @@ libsuricata_c_a_SOURCES = \
util-dpdk-i40e.c \
util-dpdk-ice.c \
util-dpdk-ixgbe.c \
util-dpdk-bonding.c \
util-ebpf.c \
util-enum.c \
util-error.c \
Expand Down
133 changes: 116 additions & 17 deletions src/runmode-dpdk.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
#include "util-dpdk-i40e.h"
#include "util-dpdk-ice.h"
#include "util-dpdk-ixgbe.h"
#include "util-dpdk-bonding.h"
#include "util-time.h"
#include "util-conf.h"
#include "suricata.h"
Expand Down Expand Up @@ -765,7 +766,7 @@ static void DeviceSetPMDSpecificRSS(struct rte_eth_rss_conf *rss_conf, const cha
{
// RSS is configured in a specific way for a driver i40e and DPDK version <= 19.xx
if (strcmp(driver_name, "net_i40e") == 0)
i40eDeviceSetRSSHashFunction(&rss_conf->rss_hf);
i40eDeviceSetRSSConf(rss_conf);
if (strcmp(driver_name, "net_ice") == 0)
iceDeviceSetRSSHashFunction(&rss_conf->rss_hf);
if (strcmp(driver_name, "net_ixgbe") == 0)
Expand Down Expand Up @@ -921,6 +922,52 @@ static void DumpRSSFlags(const uint64_t requested, const uint64_t actual)
SCLogConfig("RTE_ETH_RSS_L4_DST_ONLY %sset", (actual & RTE_ETH_RSS_L4_DST_ONLY) ? "" : "NOT ");
}

static void DumpRXOffloadCapabilities(const uint64_t rx_offld_capa)
{
SCLogConfig("RTE_ETH_RX_OFFLOAD_VLAN_STRIP - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_VLAN_STRIP ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_IPV4_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_IPV4_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_UDP_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_UDP_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_TCP_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_TCP_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_TCP_LRO - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_TCP_LRO ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_QINQ_STRIP - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_QINQ_STRIP ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_MACSEC_STRIP - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_MACSEC_STRIP ? "" : "NOT ");
#if RTE_VERSION < RTE_VERSION_NUM(22, 11, 0, 0)
SCLogConfig("RTE_ETH_RX_OFFLOAD_HEADER_SPLIT - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT ? "" : "NOT ");
#endif
SCLogConfig("RTE_ETH_RX_OFFLOAD_VLAN_FILTER - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_VLAN_FILTER ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_VLAN_EXTEND - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_SCATTER - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_SCATTER ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_TIMESTAMP - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_TIMESTAMP ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_SECURITY - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_SECURITY ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_KEEP_CRC - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_KEEP_CRC ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_SCTP_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_SCTP_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_OUTER_UDP_CKSUM - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_OUTER_UDP_CKSUM ? "" : "NOT ");
SCLogConfig("RTE_ETH_RX_OFFLOAD_RSS_HASH - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_RSS_HASH ? "" : "NOT ");
#if RTE_VERSION >= RTE_VERSION_NUM(20, 11, 0, 0)
SCLogConfig("RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT - %savailable",
rx_offld_capa & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ? "" : "NOT ");
#endif
}

static int DeviceValidateMTU(const DPDKIfaceConfig *iconf, const struct rte_eth_dev_info *dev_info)
{
if (iconf->mtu > dev_info->max_mtu || iconf->mtu < dev_info->min_mtu) {
Expand Down Expand Up @@ -975,6 +1022,7 @@ static int32_t DeviceSetSocketID(uint16_t port_id, int32_t *socket_id)
static void DeviceInitPortConf(const DPDKIfaceConfig *iconf,
const struct rte_eth_dev_info *dev_info, struct rte_eth_conf *port_conf)
{
DumpRXOffloadCapabilities(dev_info->rx_offload_capa);
*port_conf = (struct rte_eth_conf){
.rxmode = {
.mq_mode = RTE_ETH_MQ_RX_NONE,
Expand All @@ -996,7 +1044,12 @@ static void DeviceInitPortConf(const DPDKIfaceConfig *iconf,
.rss_hf = iconf->rss_hf,
};

DeviceSetPMDSpecificRSS(&port_conf->rx_adv_conf.rss_conf, dev_info->driver_name);
const char *dev_driver = dev_info->driver_name;
if (strcmp(dev_info->driver_name, "net_bonding") == 0) {
dev_driver = BondingDeviceDriverGet(iconf->port_id);
}

DeviceSetPMDSpecificRSS(&port_conf->rx_adv_conf.rss_conf, dev_driver);

uint64_t rss_hf_tmp =
port_conf->rx_adv_conf.rss_conf.rss_hf & dev_info->flow_type_rss_offloads;
Expand Down Expand Up @@ -1197,17 +1250,51 @@ static int DeviceConfigureIPS(DPDKIfaceConfig *iconf)
SCReturnInt(0);
}

/**
* Function verifies changes in e.g. device info after configuration has
* happened. Sometimes (e.g. DPDK Bond PMD with Intel NICs i40e/ixgbe) change
* device info only after the device configuration.
* @param iconf
* @param dev_info
* @return 0 on success, -EAGAIN when reconfiguration is needed, <0 on failure
*/
static int32_t DeviceVerifyPostConfigure(
const DPDKIfaceConfig *iconf, const struct rte_eth_dev_info *dev_info)
{
struct rte_eth_dev_info post_conf_dev_info = { 0 };
int32_t ret = rte_eth_dev_info_get(iconf->port_id, &post_conf_dev_info);
if (ret < 0) {
SCLogError("%s: getting device info failed (err: %s)", iconf->iface, rte_strerror(-ret));
SCReturnInt(ret);
}

if (dev_info->flow_type_rss_offloads != post_conf_dev_info.flow_type_rss_offloads ||
dev_info->rx_offload_capa != post_conf_dev_info.rx_offload_capa ||
dev_info->tx_offload_capa != post_conf_dev_info.tx_offload_capa ||
dev_info->max_rx_queues != post_conf_dev_info.max_rx_queues ||
dev_info->max_tx_queues != post_conf_dev_info.max_tx_queues ||
dev_info->max_mtu != post_conf_dev_info.max_mtu) {
SCLogWarning("Device information severely changed after configuration, reconfiguring");
return -EAGAIN;
}

if (strcmp(dev_info->driver_name, "net_bonding") == 0) {
ret = BondingAllDevicesSameDriver(iconf->port_id);
if (ret < 0) {
SCLogError("%s: bond port uses port with different DPDK drivers", iconf->iface);
SCReturnInt(ret);
}
}

return 0;
}

static int DeviceConfigure(DPDKIfaceConfig *iconf)
{
SCEnter();
// configure device
int retval;
struct rte_eth_dev_info dev_info;
struct rte_eth_conf port_conf;

retval = rte_eth_dev_get_port_by_name(iconf->iface, &(iconf->port_id));
int32_t retval = rte_eth_dev_get_port_by_name(iconf->iface, &(iconf->port_id));
if (retval < 0) {
SCLogError("%s: getting port id failed (err=%d). Is device enabled?", iconf->iface, retval);
SCLogError("%s: getting port id failed (err: %s)", iconf->iface, rte_strerror(-retval));
SCReturnInt(retval);
}

Expand All @@ -1218,13 +1305,14 @@ static int DeviceConfigure(DPDKIfaceConfig *iconf)

retval = DeviceSetSocketID(iconf->port_id, &iconf->socket_id);
if (retval < 0) {
SCLogError("%s: invalid socket id (err=%d)", iconf->iface, retval);
SCLogError("%s: invalid socket id (err: %s)", iconf->iface, rte_strerror(-retval));
SCReturnInt(retval);
}

struct rte_eth_dev_info dev_info = { 0 };
retval = rte_eth_dev_info_get(iconf->port_id, &dev_info);
if (retval != 0) {
SCLogError("%s: getting device info failed (err=%d)", iconf->iface, retval);
if (retval < 0) {
SCLogError("%s: getting device info failed (err: %s)", iconf->iface, rte_strerror(-retval));
SCReturnInt(retval);
}

Expand All @@ -1241,9 +1329,10 @@ static int DeviceConfigure(DPDKIfaceConfig *iconf)
}

retval = DeviceValidateMTU(iconf, &dev_info);
if (retval != 0)
if (retval < 0)
return retval;

struct rte_eth_conf port_conf = { 0 };
DeviceInitPortConf(iconf, &dev_info, &port_conf);
if (port_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_CHECKSUM) {
// Suricata does not need recalc checksums now
Expand All @@ -1252,12 +1341,16 @@ static int DeviceConfigure(DPDKIfaceConfig *iconf)

retval = rte_eth_dev_configure(
iconf->port_id, iconf->nb_rx_queues, iconf->nb_tx_queues, &port_conf);
if (retval != 0) {
SCLogError("%s: failed to configure the device (port %u, err %d)", iconf->iface,
iconf->port_id, retval);
if (retval < 0) {
SCLogError("%s: failed to configure the device (port %u, err %s)", iconf->iface,
iconf->port_id, rte_strerror(-retval));
SCReturnInt(retval);
}

retval = DeviceVerifyPostConfigure(iconf, &dev_info);
if (retval < 0)
return retval;

retval = rte_eth_dev_adjust_nb_rx_tx_desc(
iconf->port_id, &iconf->nb_rx_desc, &iconf->nb_tx_desc);
if (retval != 0) {
Expand Down Expand Up @@ -1348,7 +1441,13 @@ static void *ParseDpdkConfigAndConfigureDevice(const char *iface)
FatalError("DPDK configuration could not be parsed");
}

if (DeviceConfigure(iconf) != 0) {
retval = DeviceConfigure(iconf);
if (retval == -EAGAIN) {
// for e.g. bonding PMD it needs to be reconfigured
retval = DeviceConfigure(iconf);
}

if (retval < 0) { // handles both configure attempts
iconf->DerefFunc(iconf);
retval = rte_eal_cleanup();
if (retval != 0)
Expand Down
Loading