Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple streams with different VLAN priority causes high CPU utilization #1124

Open
rubensfig opened this issue Apr 16, 2024 · 6 comments
Open

Comments

@rubensfig
Copy link

Hello all,

I am facing a strange issue in the Trex stateless code, version v3.02. I am using the Mellanox Cx-5, and have created two VFs on top of the PF 0.

I am trying to create two parallel streams with different VLAN priorities, but the load generated is not what I expect it to be, and CPU util. seems incredibly high.

I have attached the output of the tui when sending only one stream (trex_good.png) and sending both streams (trex_bad.png). Additionally, I have added the tui output of the utilization (trex_util.png) of the "bad" scenario.

I have reproduced this issue with the --software and non software version.

trex_good
trex_bad
trex_util

The script used is below, and I am calling it with python3 automation/trex_control_plane/interactive/trex/examples/stl/single.py.

import stl_path
from trex.stl.api import *

import time
import pprint
from ipaddress import ip_address, ip_network

import argparse
import configparser
import os
import json


def get_packet(tos, mac_dst, ip_src, size):
    # pkt = Ether(src="02:00:00:00:00:01",dst="00:00:00:01:00:01") / IP(src="10.0.0.2", tos=tos) / UDP(sport=4444, dport=4444)

    pkt = (
        Ether(src="00:01:00:00:00:02", dst=mac_dst)
        # Ether(dst="11:11:11:11:11:11")
        # / Dot1AD(vlan=0)
        / Dot1Q(vlan=0, prio=tos)
        / IP(src=ip_src)
        / UDP(sport=4444, dport=4444)
    )
    pad = max(0, size - len(pkt)) * "x"

    return pkt / pad

def main():
    """ """
    tx_port = 0
    rx_port = 1

    c = STLClient()

    # connect to server
    c.connect()

    # prepare our ports
    c.reset(ports=[tx_port, rx_port])

    streams = []
    s = STLStream(
        packet=STLPktBuilder(
            pkt=get_packet(4,"00:11:22:33:44:55", "10.1.0.2",512),
            # vm = vm,
        ),
        isg=0 * 1000000,
        mode=STLTXCont(pps=1.2*10**6),
        # flow_stats = STLFlowLatencyStats(pg_id = 0)
        flow_stats = STLFlowStats(pg_id=0),
    )

    streams.append(s)

    s2 = STLStream(
        packet=STLPktBuilder(
            pkt=get_packet(2,"00:11:22:33:44:55", "10.1.0.2",512),
            # vm = vm,
        ),
        isg=0 * 1000000,
        mode=STLTXCont(pps=1.2*10**6),
        # flow_stats = STLFlowLatencyStats(pg_id = 0)
        flow_stats = STLFlowStats(pg_id=1),
    )

    streams.append(s2)

    c.add_streams(streams, ports=[tx_port])

    c.clear_stats()

    c.start(ports=[tx_port], duration=60, mult="25gbpsl1")

    c.wait_on_traffic(ports=[tx_port, rx_port])

    stats = c.get_stats()
    print(stats)

if __name__ == "__main__":
    main()

The following is my configuration

- port_limit: 2
  version: 2
  port_bandwidth_gb: 100
  interfaces: ["3b:00.2", "3b:00.3"]
  port_info:
    - dest_mac: 00:00:00:00:00:01
      src_mac: 00:01:00:00:00:01
    - dest_mac: 00:00:00:00:00:02
      src_mac: 00:01:00:00:00:02
  c: 14
  platform:
    master_thread_id: 8
    latency_thread_id: 27
    dual_if:
      - socket: 0
        threads: [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]

Thank you!

@rubensfig
Copy link
Author

I have also tested this in v3.04 and the bug remains.

Would appreciate if anyone could provide some help on the issue :)

@hhaim
Copy link
Contributor

hhaim commented Apr 16, 2024

@rubensfig the issue is a DPDK mlx5 driver issue, I would report it to the maintainers on DPDK forum

@rubensfig
Copy link
Author

@hhaim Thank you for the pointer. I have posted it on [email protected].

Should I keep this ticket open, or close and re-open once the DPDK upstream gets resolved?

@hhaim
Copy link
Contributor

hhaim commented Apr 16, 2024

@rubensfig I would keep it and update it if there is a new info from the maintainers .. mlx5 driver is a complex one with many dependencies

@rubensfig
Copy link
Author

rubensfig commented Apr 19, 2024

Hello @hhaim, everyone!

I have obtained some support from the DPDK mailing list, here is the relevant comment with the solution. https://mails.dpdk.org/archives/users/2024-April/007635.html

Essentially, we need to make sure the NIC-level QoS parameters are set. I am pasting the relevant commands below, from the DPDK thread.

sudo mlnx_qos -i <iface> --trust=dscp
for dscp in {0..63}; do sudo mlnx_qos -i <iface> --dscp2prio set,$dscp,0; sleep 0.001;done

I can create a documentation note about this in the Mellanox annex, under the limitations/issues section. What would you think? https://trex-tgn.cisco.com/trex/doc/trex_appendix_mellanox.html

@hhaim
Copy link
Contributor

hhaim commented Apr 21, 2024

@rubensfig thanks for looking into it. It would be great to add the annex to this command and please refer to the version of trex/ofed/dpdk. so mlx5 become even more complex now ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants