Skip to content

Fix test_m2o_fluctuating_lossless#24993

Open
ediwibowo-msft wants to merge 1 commit into
sonic-net:masterfrom
ediwibowo-msft:fix/m2o_fluctuating_lossless_helper
Open

Fix test_m2o_fluctuating_lossless#24993
ediwibowo-msft wants to merge 1 commit into
sonic-net:masterfrom
ediwibowo-msft:fix/m2o_fluctuating_lossless_helper

Conversation

@ediwibowo-msft

@ediwibowo-msft ediwibowo-msft commented May 30, 2026

Copy link
Copy Markdown
Contributor

Description of PR

Summary:
The m2o_fluctuating_lossless test had a hard-coded background-loss expectation that started failing on DUTs whose egress DWRR scheduler weights differ (e.g. where lossless and lossy queues both run weight 15 and the analytical loss is ~10%, vs. ~11.27% on another DUT where lossless=15 / lossy=14). Instead of assuming the same queue/scheduler weights, derive the expected per-BG-flow loss at runtime from the DUT's QUEUE / SCHEDULER tables.

Fixes #24992

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

test_m2o_fluctuating_lossless asserts that each Background Flow sees a specific loss percentage at the egress congestion point. The previous expectation was a hard-coded constant tuned for one DUT's DWRR weights, so the test became fragile / broken on DUTs with different scheduler configurations. The expected loss is fully determined by:

  • the test/background flows' offered rates and target TCs, and
  • the egress port's per-queue DWRR weights.

We can compute it analytically rather than hard-coding it.

How did you do it?

  1. tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py

    • New helper get_expected_bg_loss_percent(...) that:
      • Reads TC_TO_QUEUE_MAP from config_facts (with host/source/namespace honoring asic_value).
      • Reads per-queue DWRR weights via get_queue_scheduler_weight_dict, depending on asic_value, port, and qos_map_profile.
      • Builds a queue_demand / queue_weight map from the test + background flows.
      • Runs an iterative DWRR allocator (queues whose demand is under their weight share are satisfied first; the rest split the remainder by weight).
      • Returns the average per-Background-Flow loss percent.
    • run_m2o_fluctuating_lossless_test now calls this helper (passing asic_value and the egress port) and the result drives the assertion in verify_m2o_fluctuating_lossless_result with a BG_LOSS_TOLERANCE_PERCENT = 1 tolerance — replacing the hard-coded value.
    • pytest_assert guards on bg_flow_rate_percent (non-empty + all entries equal — current limitation, called out in a comment) and on the queue-weight lookup so the test fails fast with a clear message rather than KeyError/ZeroDivisionError.
  2. tests/common/snappi_tests/common_helpers.py

    • New get_queue_scheduler_weight_dict(host_ans, asic_value=None, port=None, qos_map_profile=None) that joins QUEUE + SCHEDULER from config_facts and annotates each queue with one DSCP via DSCP_TO_TC_MAP + TC_TO_QUEUE_MAP.
    • If a DUT has no QUEUE/SCHEDULER config, returns a default 8-queue equal-weight DWRR map (weight 15) so callers always get a usable structure.
  3. Unit tests (no DUT / no Snappi required)

    • tests/common/unit_tests/snappi_tests/unit_test_common_helpers.py — covers get_queue_scheduler_weight_dict against a CONFIG_FACTS fixture mirroring real ansible -m config_facts output from a DUT, plus a defaults-when-unconfigured case.

    • tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py — parametrized analytical-DWRR check of get_expected_bg_loss_percent:

      • mixed_weights_15_14 → expected ~11.27% (matches IxNetwork-measured 11.259% on a DUT with lossless 15 weight and lossy 14 weight.).
      • uniform_weights_15 → expected 10.0% (fair share).
    • Each unit-test directory has a README documenting --noconftest invocation and the extraction mechanism.

How did you verify/test it?

  • Unit tests pass:
python3 -m pytest --noconftest   "tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py"   -v
====================================== test session starts =======================================
platform linux -- Python 3.12.3, pytest-7.4.4, pluggy-1.4.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/ediwibowo/workspace/sonic-mgmt-int/tests
configfile: pytest.ini
collected 2 items                                                                                

tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py::test_expected_bg_loss_matches_analytical_dwrr_split[mixed_weights_15_14-0-11.2676] PASSED [ 50%]
tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py::test_expected_bg_loss_matches_analytical_dwrr_split[uniform_weights_15-1-10.0] PASSED [100%]

Both files pass (4 tests total), including the analytical-DWRR cases that reproduce the IxNetwork-measured 11.259% on DUT with lossless 15 and lossy 14 within the 1% tolerance and 10.0% on a uniform-weight platform.

  • Integration/Snappi tests pass on DUT with with lossless 15 and lossy 14
python3 -m  pytest snappi_tests/pfc/test_m2o_fluctuating_lossless.py --inventory ../ansible/ixia,../ansible/veos --host-pattern <DUT>  --dpu-pattern None --testbed <testbed> --testbed_file ../ansible/testbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --log-file logs/snappi_tests/pfc/focused_retry_check.log --junitxml=logs/snappi_tests/pfc/focused_retry_check.xml -s --trim_inv --skip_sanity --disable_loganalyzer --maxfail=1 
...
snappi_tests/pfc/test_m2o_fluctuating_lossless.py::test_m2o_fluctuating_lossless[tgen_port_info0]  PASSED

Any platform specific information?

This change removes the platform-specific hard-coded loss expectation. The test should now self-adjust to any DUT whose QUEUE/SCHEDULER tables are readable via config_facts.

Supported testbed topology if it's a new test case?

N/A — existing test, unchanged topology requirements.

Documentation

Traffic flows of a DUT with lossless 15 weight and lossy 15 weight:
image

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@github-actions github-actions Bot requested a review from auspham May 30, 2026 01:42
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@github-actions github-actions Bot requested review from YatishSVC and developfast May 30, 2026 01:42
Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated
Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated
Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py Outdated
@sdszhang

Copy link
Copy Markdown
Contributor

@amitpawar12 @selldinesh @kamalsahu0001 for review.

@mssonicbld mssonicbld added the Request for 202511 branch Request to backport a change to 202511 branch label May 30, 2026
@mssonicbld

Copy link
Copy Markdown
Collaborator

This PR has backport request for branch(es): 202511.
Added label(s) for branch(es) 202511.

---Powered by SONiC BuildBot

@sdszhang sdszhang moved this to In Progress in SONiC Snappi Jun 3, 2026
@amitpawar12

Copy link
Copy Markdown
Contributor

Hi @ediwibowo-msft,

I think the scheduler configuration has DWRR lossy and lossless weights to 14 and 15 respectively. This is the output from VOQ chassis.

{
  "scheduler.0": {
    "type": "DWRR",
    "weight": "14"
  },
  "scheduler.1": {
    "type": "DWRR",
    "weight": "15"
  }
}

Please let me know.

Thanks

@kamalsahu0001

Copy link
Copy Markdown
Contributor

Hi @amitpawar12 @ediwibowo-msft .... For T0/T1, before running the test, we set same weights to both schedulers so that it will be uniform. Once we run with these changes, we observe 12.5% loss on lossy flows as explained by @ediwibowo-msft

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 396d799 to 3f4ceb7 Compare June 4, 2026 09:51
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ediwibowo-msft ediwibowo-msft requested a review from sdszhang June 4, 2026 11:45
@ediwibowo-msft ediwibowo-msft self-assigned this Jun 4, 2026
# remaining bandwidth by weight. Each BG queue receives
# 100*(14/86) + (15/86)*(14/71) ~= 17.75%, dropping ~2.25% of 20%.
# - Per-flow loss = (20 - 17.75) / 20 ~= 11.27%.
EXPECTED_BG_LOSS_PERCENT = 11.27

@sdszhang sdszhang Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works if the DWRR weight is [15, 14]. As discussed in the community meeting, we can add a function to calculate it dynamically based on the configured weight so it works if the weight is same for lossless and lossy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new method get_queue_scheduler_weight_dict in tests/common/snappi_tests/common_helpers.py to dynamically retrieve the weights from DUT.

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 3f4ceb7 to 6de078a Compare June 5, 2026 01:43
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@github-actions github-actions Bot requested a review from rraghav-cisco June 5, 2026 01:44
@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 6de078a to 2aa901b Compare June 5, 2026 02:31
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 2aa901b to c5678c0 Compare June 5, 2026 03:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py
Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py
Comment thread tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.py
Comment thread tests/common/snappi_tests/common_helpers.py Outdated
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 250b204 to 7418989 Compare June 9, 2026 01:58
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 7418989 to c7c8d45 Compare June 9, 2026 02:14
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from c7c8d45 to d1d708c Compare June 9, 2026 02:20
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

sdszhang
sdszhang previously approved these changes Jun 9, 2026
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +1553 to +1563
for q, value in queue_cfg_all[port].items():
scheduler = value.get("scheduler")
if scheduler is None or scheduler not in scheduler_cfg:
continue
sched = scheduler_cfg[scheduler]
result[int(q)] = {
"scheduler": scheduler,
"type": sched.get("type"),
"weight": int(sched["weight"]),
"dscp": queue_to_dscp.get(int(q)),
}
@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 4ac0283 to 562b760 Compare June 9, 2026 03:45
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Edi Wibowo <ediwibowo@microsoft.com>
@ediwibowo-msft ediwibowo-msft force-pushed the fix/m2o_fluctuating_lossless_helper branch from 562b760 to 8975390 Compare June 9, 2026 03:58
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Request for 202511 branch Request to backport a change to 202511 branch

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Bug: Background flow loss assertion in m2o_fluctuating_lossless test uses incorrect expected value

6 participants