[202511][Backport #23939] Remove BUFFER_SIZE override in test_dhcp_counter_stress#24592
Merged
vmittal-msft merged 1 commit intoMay 14, 2026
Conversation
…t_dhcp_counter_stress Partial back-port of sonic-net#23939 — only the buffer-size removal part. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xichen96 <lukelin0907@gmail.com>
0011eab to
a70f7c0
Compare
1 task
Collaborator
|
/azp run |
1 similar comment
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
Xichen96
added a commit
that referenced
this pull request
May 14, 2026
…cket_on_dut (#24594) **Why** PR #22876 cut the default `tcpdump_buffer_size` in `capture_and_check_packet_on_dut` from 102400 KiB (100 MiB) to 4096 KiB (4 MiB) to address the 1 GiB explicit override in `test_dhcp_counter_stress` causing a ~2 GiB memory spike. The 4 MiB default, however, is too small for normal stress-test workloads with bursty packet rates on lower-perf platforms (720dt, 7215). **Note: PR #22876 was not actually validated.** The `How did you verify/test it?` section of #22876 only states: > - Code passes flake8 with max-line-length=120 > - Fix matches the exact unit interpretation from tcpdump documentation No test execution. The 4 MiB value was a unit-correction choice based on tcpdump docs, not an empirically-tested minimum. As shown by the data below, 4 MiB drops 7.2% of relayed DHCP packets on real hardware — i.e. the new default was never validated against the stress-test workload that motivated the change. Empirical measurements on `testbed-bjw2-can-720dt-6` (Arista-720DT-G48S4, SONiC.20251110.26) running `test_dhcp_counter_stress[discover]` (25 pps × 48 servers × 120 s): | Buffer | tcpdump drop rate vs dhcpmon counter | test result | |---:|---:|---| | 1 MiB | 13.0% | FAIL | | **4 MiB (current default, set by #22876)** | **7.2%** | **FAIL** | | 16 MiB | 1.47% | FAIL | | 64 MiB | <0.01% | PASS | **How** Bump default from `4096` (4 MiB) to `131072` (128 MiB). Gives comfortable headroom for stress tests on slow platforms while remaining 8× smaller than the historical 1 GiB explicit override that caused memory spikes. **Back port request** - [x] 202511 Refs: PR #22876 (commit 58a6f0b), PR #20580, PR #24592 (companion 202511-only cleanup). Signed-off-by: Xichen96 <lukelin0907@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 task
Contributor
Author
|
@vmittal-msft Hi Vineet, could you help approve and merge? thanks |
mssonicbld
added a commit
that referenced
this pull request
May 14, 2026
…capture_and_check_packet_on_dut (#24621) **Why** PR #22876 cut the default `tcpdump_buffer_size` in `capture_and_check_packet_on_dut` from 102400 KiB (100 MiB) to 4096 KiB (4 MiB) to address the 1 GiB explicit override in `test_dhcp_counter_stress` causing a ~2 GiB memory spike. The 4 MiB default, however, is too small for normal stress-test workloads with bursty packet rates on lower-perf platforms (720dt, 7215). **Note: PR #22876 was not actually validated.** The `How did you verify/test it?` section of #22876 only states: > - Code passes flake8 with max-line-length=120 > - Fix matches the exact unit interpretation from tcpdump documentation No test execution. The 4 MiB value was a unit-correction choice based on tcpdump docs, not an empirically-tested minimum. As shown by the data below, 4 MiB drops 7.2% of relayed DHCP packets on real hardware — i.e. the new default was never validated against the stress-test workload that motivated the change. Empirical measurements on `testbed-bjw2-can-720dt-6` (Arista-720DT-G48S4, SONiC.20251110.26) running `test_dhcp_counter_stress[discover]` (25 pps × 48 servers × 120 s): | Buffer | tcpdump drop rate vs dhcpmon counter | test result | |---:|---:|---| | 1 MiB | 13.0% | FAIL | | **4 MiB (current default, set by #22876)** | **7.2%** | **FAIL** | | 16 MiB | 1.47% | FAIL | | 64 MiB | <0.01% | PASS | **How** Bump default from `4096` (4 MiB) to `131072` (128 MiB). Gives comfortable headroom for stress tests on slow platforms while remaining 8× smaller than the historical 1 GiB explicit override that caused memory spikes. **Back port request** - [x] 202511 Refs: PR #22876 (commit 58a6f0b), PR #20580, PR #24592 (companion 202511-only cleanup). Signed-off-by: Xichen96 <lukelin0907@gmail.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Xichen96 <lukelin0907@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
vmittal-msft
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Partial back-port of #23939 — only the buffer-size removal part.
The other changes in #23939 (adding
relay_agent/downlink_vlan_iface_nameptf_runner params, dropping.jsonfromcount_filepath) depend on PR #19198 which was reverted from 202511 by #23714, so they don't apply here.Why
After PR #22876 was back-ported to 202511, the framework default for
tcpdump_buffer_size(incapture_and_check_packet_on_dut) dropped from 100 MiB to 4 MiB. The explicitBUFFER_SIZE = 1024(1 MiB) override in this test is now smaller than the framework default. Removing the override lets the test use the (larger) framework default and matches master.Note
The 4 MiB default is still too small to fully pass on 720dt-class hardware under this test's stress load. A follow-up PR will raise the default in
capture_and_check_packet_on_dut. This PR is pure cleanup to bring 202511 in line with master.Tested
Applied locally on
internal-202511@da7363c,testbed-bjw2-can-720dt-6(Arista-720DT-G48S4, SONiC.20251110.26):Refs: #23939, #22876, #20580.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com