Fix test_m2o_fluctuating_lossless#24993
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@amitpawar12 @selldinesh @kamalsahu0001 for review. |
|
This PR has backport request for branch(es): 202511. ---Powered by SONiC BuildBot
|
|
Hi @ediwibowo-msft, I think the scheduler configuration has DWRR lossy and lossless weights to 14 and 15 respectively. This is the output from VOQ chassis. Please let me know. Thanks |
|
Hi @amitpawar12 @ediwibowo-msft .... For T0/T1, before running the test, we set same weights to both schedulers so that it will be uniform. Once we run with these changes, we observe 12.5% loss on lossy flows as explained by @ediwibowo-msft |
396d799 to
3f4ceb7
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| # remaining bandwidth by weight. Each BG queue receives | ||
| # 100*(14/86) + (15/86)*(14/71) ~= 17.75%, dropping ~2.25% of 20%. | ||
| # - Per-flow loss = (20 - 17.75) / 20 ~= 11.27%. | ||
| EXPECTED_BG_LOSS_PERCENT = 11.27 |
There was a problem hiding this comment.
This works if the DWRR weight is [15, 14]. As discussed in the community meeting, we can add a function to calculate it dynamically based on the configured weight so it works if the weight is same for lossless and lossy.
There was a problem hiding this comment.
Added new method get_queue_scheduler_weight_dict in tests/common/snappi_tests/common_helpers.py to dynamically retrieve the weights from DUT.
3f4ceb7 to
6de078a
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
6de078a to
2aa901b
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2aa901b to
c5678c0
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
250b204 to
7418989
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
7418989 to
c7c8d45
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
c7c8d45 to
d1d708c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
d1d708c to
4ac0283
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| for q, value in queue_cfg_all[port].items(): | ||
| scheduler = value.get("scheduler") | ||
| if scheduler is None or scheduler not in scheduler_cfg: | ||
| continue | ||
| sched = scheduler_cfg[scheduler] | ||
| result[int(q)] = { | ||
| "scheduler": scheduler, | ||
| "type": sched.get("type"), | ||
| "weight": int(sched["weight"]), | ||
| "dscp": queue_to_dscp.get(int(q)), | ||
| } |
4ac0283 to
562b760
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Edi Wibowo <ediwibowo@microsoft.com>
562b760 to
8975390
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description of PR
Summary:
The m2o_fluctuating_lossless test had a hard-coded background-loss expectation that started failing on DUTs whose egress DWRR scheduler weights differ (e.g. where lossless and lossy queues both run weight 15 and the analytical loss is ~10%, vs. ~11.27% on another DUT where lossless=15 / lossy=14). Instead of assuming the same queue/scheduler weights, derive the expected per-BG-flow loss at runtime from the DUT's QUEUE / SCHEDULER tables.
Fixes #24992
Type of change
Back port request
Approach
What is the motivation for this PR?
test_m2o_fluctuating_losslessasserts that each Background Flow sees a specific loss percentage at the egress congestion point. The previous expectation was a hard-coded constant tuned for one DUT's DWRR weights, so the test became fragile / broken on DUTs with different scheduler configurations. The expected loss is fully determined by:We can compute it analytically rather than hard-coding it.
How did you do it?
tests/snappi_tests/pfc/files/m2o_fluctuating_lossless_helper.pyget_expected_bg_loss_percent(...)that:TC_TO_QUEUE_MAPfromconfig_facts(withhost/source/namespacehonoringasic_value).get_queue_scheduler_weight_dict, depending onasic_value,port, andqos_map_profile.queue_demand/queue_weightmap from the test + background flows.run_m2o_fluctuating_lossless_testnow calls this helper (passingasic_valueand the egressport) and the result drives the assertion inverify_m2o_fluctuating_lossless_resultwith aBG_LOSS_TOLERANCE_PERCENT = 1tolerance — replacing the hard-coded value.pytest_assertguards onbg_flow_rate_percent(non-empty + all entries equal — current limitation, called out in a comment) and on the queue-weight lookup so the test fails fast with a clear message rather thanKeyError/ZeroDivisionError.tests/common/snappi_tests/common_helpers.pyget_queue_scheduler_weight_dict(host_ans, asic_value=None, port=None, qos_map_profile=None)that joinsQUEUE+SCHEDULERfrom config_facts and annotates each queue with one DSCP viaDSCP_TO_TC_MAP+TC_TO_QUEUE_MAP.QUEUE/SCHEDULERconfig, returns a default 8-queue equal-weight DWRR map (weight 15) so callers always get a usable structure.Unit tests (no DUT / no Snappi required)
tests/common/unit_tests/snappi_tests/unit_test_common_helpers.py— coversget_queue_scheduler_weight_dictagainst aCONFIG_FACTSfixture mirroring realansible -m config_factsoutput from a DUT, plus a defaults-when-unconfigured case.tests/snappi_tests/unit_tests/pfc/unit_test_m2o_fluctuating_lossless_helper.py— parametrized analytical-DWRR check ofget_expected_bg_loss_percent:mixed_weights_15_14→ expected ~11.27% (matches IxNetwork-measured 11.259% on a DUT with lossless 15 weight and lossy 14 weight.).uniform_weights_15→ expected 10.0% (fair share).Each unit-test directory has a README documenting
--noconftestinvocation and the extraction mechanism.How did you verify/test it?
Both files pass (4 tests total), including the analytical-DWRR cases that reproduce the IxNetwork-measured 11.259% on DUT with lossless 15 and lossy 14 within the 1% tolerance and 10.0% on a uniform-weight platform.
Any platform specific information?
This change removes the platform-specific hard-coded loss expectation. The test should now self-adjust to any DUT whose
QUEUE/SCHEDULERtables are readable viaconfig_facts.Supported testbed topology if it's a new test case?
N/A — existing test, unchanged topology requirements.
Documentation
Traffic flows of a DUT with lossless 15 weight and lossy 15 weight:
