Fix test_m2o_fluctuating_lossless by ediwibowo-msft · Pull Request #24993 · sonic-net/sonic-mgmt

ediwibowo-msft · 2026-05-30T01:42:22Z

Description of PR

Summary:
The m2o_fluctuating_lossless test had a hard-coded background-loss expectation that started failing on DUTs whose egress DWRR scheduler weights differ (e.g. where lossless and lossy queues both run weight 15 and the analytical loss is ~10%, vs. ~11.27% on another DUT where lossless=15 / lossy=14). Instead of assuming the same queue/scheduler weights, derive the expected per-BG-flow loss at runtime from the DUT's QUEUE / SCHEDULER tables.

Fixes #24992

Type of change

Back port request

Approach

What is the motivation for this PR?

test_m2o_fluctuating_lossless asserts that each Background Flow sees a specific loss percentage at the egress congestion point. The previous expectation was a hard-coded constant tuned for one DUT's DWRR weights, so the test became fragile / broken on DUTs with different scheduler configurations. The expected loss is fully determined by:

the test/background flows' offered rates and target TCs, and
the egress port's per-queue DWRR weights.

We can compute it analytically rather than hard-coding it.

How did you do it?

tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py
- New helper get_expected_bg_loss_percent(...) that:
  - Reads TC_TO_QUEUE_MAP from config_facts (with host/source/namespace honoring asic_value).
  - Reads per-queue DWRR weights via get_queue_scheduler_weight_dict, depending on asic_value, port, and qos_map_profile.
  - Builds a queue_demand / queue_weight map from the test + background flows.
  - Runs an iterative DWRR allocator (queues whose demand is under their weight share are satisfied first; the rest split the remainder by weight).
  - Returns the average per-Background-Flow loss percent.
- run_m2o_fluctuating_lossless_test now calls this helper (passing asic_value and the egress port) and the result drives the assertion in verify_m2o_fluctuating_lossless_result with a BG_LOSS_TOLERANCE_PERCENT = 1 tolerance — replacing the hard-coded value.
- pytest_assert guards on bg_flow_rate_percent (non-empty + all entries equal — current limitation, called out in a comment) and on the queue-weight lookup so the test fails fast with a clear message rather than KeyError/ZeroDivisionError.
tests/common/snappi_tests/common_helpers.py
- New get_queue_scheduler_weight_dict(host_ans, asic_value=None, port=None, qos_map_profile=None) that joins QUEUE + SCHEDULER from config_facts and annotates each queue with one DSCP via DSCP_TO_TC_MAP + TC_TO_QUEUE_MAP.
- If a DUT has no QUEUE/SCHEDULER config, returns a default 8-queue equal-weight DWRR map (weight 15) so callers always get a usable structure.
Unit tests (no DUT / no Snappi required)
- tests/common/unit_tests/snappi_tests/unit_test_common_helpers.py — covers get_queue_scheduler_weight_dict against a CONFIG_FACTS fixture mirroring real ansible -m config_facts output from a DUT, plus a defaults-when-unconfigured case.
- tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py — parametrized analytical-DWRR check of get_expected_bg_loss_percent:
  - mixed_weights_15_14 → expected ~11.27% (matches IxNetwork-measured 11.259% on a DUT with lossless 15 weight and lossy 14 weight.).
  - uniform_weights_15 → expected 10.0% (fair share).
- Each unit-test directory has a README documenting --noconftest invocation and the extraction mechanism.

How did you verify/test it?

Unit tests pass:

python3 -m pytest --noconftest   "tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py"   -v
====================================== test session starts =======================================
platform linux -- Python 3.12.3, pytest-7.4.4, pluggy-1.4.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/ediwibowo/workspace/sonic-mgmt-int/tests
configfile: pytest.ini
collected 2 items                                                                                

tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py::test_expected_bg_loss_matches_analytical_dwrr_split[mixed_weights_15_14-0-11.2676] PASSED [ 50%]
tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py::test_expected_bg_loss_matches_analytical_dwrr_split[uniform_weights_15-1-10.0] PASSED [100%]

Both files pass (4 tests total), including the analytical-DWRR cases that reproduce the IxNetwork-measured 11.259% on DUT with lossless 15 and lossy 14 within the 1% tolerance and 10.0% on a uniform-weight platform.

Integration/Snappi tests pass on DUT with with lossless 15 and lossy 14

python3 -m  pytest snappi_tests/pfc/test_m2o_fluctuating_lossless.py --inventory ../ansible/ixia,../ansible/veos --host-pattern <DUT>  --dpu-pattern None --testbed <testbed> --testbed_file ../ansible/testbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --log-file logs/snappi_tests/pfc/focused_retry_check.log --junitxml=logs/snappi_tests/pfc/focused_retry_check.xml -s --trim_inv --skip_sanity --disable_loganalyzer --maxfail=1 
...
snappi_tests/pfc/test_m2o_fluctuating_lossless.py::test_m2o_fluctuating_lossless[tgen_port_info0]  PASSED

Any platform specific information?

This change removes the platform-specific hard-coded loss expectation. The test should now self-adjust to any DUT whose QUEUE/SCHEDULER tables are readable via config_facts.

Supported testbed topology if it's a new test case?

N/A — existing test, unchanged topology requirements.

Documentation

Traffic flows of a DUT with lossless 15 weight and lossy 15 weight:

mssonicbld · 2026-05-30T01:42:29Z

/azp run

azure-pipelines · 2026-05-30T01:42:43Z

Azure Pipelines successfully started running 1 pipeline(s).

sdszhang · 2026-05-30T02:07:13Z

@amitpawar12 @selldinesh @kamalsahu0001 for review.

mssonicbld · 2026-05-30T11:14:12Z

This PR has backport request for branch(es): 202511.
Added label(s) for branch(es) 202511.

_{---Powered by SONiC BuildBot}

amitpawar12 · 2026-06-03T13:44:51Z

Hi @ediwibowo-msft,

I think the scheduler configuration has DWRR lossy and lossless weights to 14 and 15 respectively. This is the output from VOQ chassis.

{
  "scheduler.0": {
    "type": "DWRR",
    "weight": "14"
  },
  "scheduler.1": {
    "type": "DWRR",
    "weight": "15"
  }
}

Please let me know.

Thanks

kamalsahu0001 · 2026-06-03T18:25:12Z

Hi @amitpawar12 @ediwibowo-msft .... For T0/T1, before running the test, we set same weights to both schedulers so that it will be uniform. Once we run with these changes, we observe 12.5% loss on lossy flows as explained by @ediwibowo-msft

mssonicbld · 2026-06-04T09:51:20Z

/azp run

azure-pipelines · 2026-06-04T09:51:37Z

Azure Pipelines successfully started running 1 pipeline(s).

sdszhang · 2026-06-04T23:02:23Z

+#     remaining bandwidth by weight. Each BG queue receives
+#       100*(14/86) + (15/86)*(14/71) ~= 17.75%, dropping ~2.25% of 20%.
+#   - Per-flow loss = (20 - 17.75) / 20 ~= 11.27%.
+EXPECTED_BG_LOSS_PERCENT = 11.27


This works if the DWRR weight is [15, 14]. As discussed in the community meeting, we can add a function to calculate it dynamically based on the configured weight so it works if the weight is same for lossless and lossy.

Added new method get_queue_scheduler_weight_dict in tests/common/snappi_tests/common_helpers.py to dynamically retrieve the weights from DUT.

mssonicbld · 2026-06-05T01:43:54Z

/azp run

azure-pipelines · 2026-06-05T01:44:08Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-06-05T02:31:37Z

/azp run

azure-pipelines · 2026-06-05T02:31:52Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

mssonicbld · 2026-06-09T01:53:57Z

/azp run

azure-pipelines · 2026-06-09T01:54:12Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-06-09T01:58:15Z

/azp run

azure-pipelines · 2026-06-09T01:58:30Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-06-09T02:14:52Z

/azp run

azure-pipelines · 2026-06-09T02:15:06Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-06-09T02:21:05Z

/azp run

azure-pipelines · 2026-06-09T02:21:19Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-06-09T03:31:14Z

/azp run

azure-pipelines · 2026-06-09T03:31:28Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

+    for q, value in queue_cfg_all[port].items():
+        scheduler = value.get("scheduler")
+        if scheduler is None or scheduler not in scheduler_cfg:
+            continue
+        sched = scheduler_cfg[scheduler]
+        result[int(q)] = {
+            "scheduler": scheduler,
+            "type": sched.get("type"),
+            "weight": int(sched["weight"]),
+            "dscp": queue_to_dscp.get(int(q)),
+        }


mssonicbld · 2026-06-09T03:45:49Z

/azp run

azure-pipelines · 2026-06-09T03:46:04Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Edi Wibowo <ediwibowo@microsoft.com>

mssonicbld · 2026-06-09T03:58:57Z

/azp run

azure-pipelines · 2026-06-09T03:59:11Z

Azure Pipelines successfully started running 1 pipeline(s).

github-actions Bot requested a review from auspham May 30, 2026 01:42

github-actions Bot requested review from YatishSVC and developfast May 30, 2026 01:42

sdszhang reviewed May 30, 2026

View reviewed changes

Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated

Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated

sdszhang reviewed May 30, 2026

View reviewed changes

Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated

mssonicbld added the Request for 202511 branch Request to backport a change to 202511 branch label May 30, 2026

sdszhang added this to SONiC Snappi Jun 3, 2026

sdszhang moved this to In Progress in SONiC Snappi Jun 3, 2026

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 396d799 to 3f4ceb7 Compare June 4, 2026 09:51

ediwibowo-msft requested a review from sdszhang June 4, 2026 11:45

ediwibowo-msft self-assigned this Jun 4, 2026

sdszhang reviewed Jun 4, 2026

View reviewed changes

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 3f4ceb7 to 6de078a Compare June 5, 2026 01:43

github-actions Bot requested a review from rraghav-cisco June 5, 2026 01:44

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 6de078a to 2aa901b Compare June 5, 2026 02:31

github-actions Bot requested review from wangxin and yutongzhang-microsoft June 5, 2026 02:31

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 2aa901b to c5678c0 Compare June 5, 2026 03:19

Copilot started reviewing on behalf of ediwibowo-msft June 9, 2026 01:11 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 250b204 to 7418989 Compare June 9, 2026 01:58

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 7418989 to c7c8d45 Compare June 9, 2026 02:14

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from c7c8d45 to d1d708c Compare June 9, 2026 02:20

sdszhang previously approved these changes Jun 9, 2026

View reviewed changes

ediwibowo-msft dismissed sdszhang’s stale review via 4ac0283 June 9, 2026 03:31

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from d1d708c to 4ac0283 Compare June 9, 2026 03:31

ediwibowo-msft requested review from Copilot and sdszhang June 9, 2026 03:31

Copilot started reviewing on behalf of ediwibowo-msft June 9, 2026 03:32 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 4ac0283 to 562b760 Compare June 9, 2026 03:45

snappi/pfc: derive expected background loss from DWRR scheduler weights

8975390

Signed-off-by: Edi Wibowo <ediwibowo@microsoft.com>

ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 562b760 to 8975390 Compare June 9, 2026 03:58

sdszhang approved these changes Jun 9, 2026

View reviewed changes

Conversation

ediwibowo-msft commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented May 30, 2026

Uh oh!

azure-pipelines Bot commented May 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdszhang commented May 30, 2026

Uh oh!

mssonicbld commented May 30, 2026

Uh oh!

amitpawar12 commented Jun 3, 2026

Uh oh!

kamalsahu0001 commented Jun 3, 2026

Uh oh!

mssonicbld commented Jun 4, 2026

Uh oh!

azure-pipelines Bot commented Jun 4, 2026

Uh oh!

sdszhang Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ediwibowo-msft Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Jun 5, 2026

Uh oh!

azure-pipelines Bot commented Jun 5, 2026

Uh oh!

mssonicbld commented Jun 5, 2026

Uh oh!

azure-pipelines Bot commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mssonicbld commented Jun 9, 2026

Uh oh!

azure-pipelines Bot commented Jun 9, 2026

Uh oh!

mssonicbld commented Jun 9, 2026

Uh oh!

azure-pipelines Bot commented Jun 9, 2026

Uh oh!

mssonicbld commented Jun 9, 2026

Uh oh!

azure-pipelines Bot commented Jun 9, 2026

Uh oh!

mssonicbld commented Jun 9, 2026

Uh oh!

azure-pipelines Bot commented Jun 9, 2026

Uh oh!

mssonicbld commented Jun 9, 2026

Uh oh!

azure-pipelines Bot commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ediwibowo-msft commented May 30, 2026 •

edited

Loading

sdszhang Jun 4, 2026 •

edited

Loading