Skip to content

[202511][Backport #23939] Remove BUFFER_SIZE override in test_dhcp_counter_stress#24592

Merged
vmittal-msft merged 1 commit into
sonic-net:202511from
Xichen96:dev/xichenlin/202511-remove-buffer-override
May 14, 2026
Merged

[202511][Backport #23939] Remove BUFFER_SIZE override in test_dhcp_counter_stress#24592
vmittal-msft merged 1 commit into
sonic-net:202511from
Xichen96:dev/xichenlin/202511-remove-buffer-override

Conversation

@Xichen96

Copy link
Copy Markdown
Contributor

Partial back-port of #23939 — only the buffer-size removal part.

The other changes in #23939 (adding relay_agent/downlink_vlan_iface_name ptf_runner params, dropping .json from count_file path) depend on PR #19198 which was reverted from 202511 by #23714, so they don't apply here.

Why

After PR #22876 was back-ported to 202511, the framework default for tcpdump_buffer_size (in capture_and_check_packet_on_dut) dropped from 100 MiB to 4 MiB. The explicit BUFFER_SIZE = 1024 (1 MiB) override in this test is now smaller than the framework default. Removing the override lets the test use the (larger) framework default and matches master.

Note

The 4 MiB default is still too small to fully pass on 720dt-class hardware under this test's stress load. A follow-up PR will raise the default in capture_and_check_packet_on_dut. This PR is pure cleanup to bring 202511 in line with master.

Tested

Applied locally on internal-202511 @ da7363c, testbed-bjw2-can-720dt-6 (Arista-720DT-G48S4, SONiC.20251110.26):

  • Before (1 MiB): tcpdump drops ~13% of relayed packets, FAIL
  • After (4 MiB framework default): drops reduced to ~7%, still over the 0.01% margin (follow-up PR will fix)
  • For reference, 64 MiB passes cleanly.

Refs: #23939, #22876, #20580.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…t_dhcp_counter_stress

Partial back-port of sonic-net#23939 — only the buffer-size removal part.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xichen96 <lukelin0907@gmail.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

1 similar comment
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Xichen96 added a commit that referenced this pull request May 14, 2026
…cket_on_dut (#24594)

**Why**

PR #22876 cut the default `tcpdump_buffer_size` in
`capture_and_check_packet_on_dut` from 102400 KiB (100 MiB) to 4096 KiB
(4 MiB) to address the 1 GiB explicit override in
`test_dhcp_counter_stress` causing a ~2 GiB memory spike. The 4 MiB
default, however, is too small for normal stress-test workloads with
bursty packet rates on lower-perf platforms (720dt, 7215).

**Note: PR #22876 was not actually validated.**

The `How did you verify/test it?` section of #22876 only states:

> - Code passes flake8 with max-line-length=120
> - Fix matches the exact unit interpretation from tcpdump documentation

No test execution. The 4 MiB value was a unit-correction choice based on
tcpdump docs, not an empirically-tested minimum. As shown by the data
below, 4 MiB drops 7.2% of relayed DHCP packets on real hardware — i.e.
the new default was never validated against the stress-test workload
that motivated the change.

Empirical measurements on `testbed-bjw2-can-720dt-6`
(Arista-720DT-G48S4, SONiC.20251110.26) running
`test_dhcp_counter_stress[discover]` (25 pps × 48 servers × 120 s):

| Buffer | tcpdump drop rate vs dhcpmon counter | test result |
|---:|---:|---|
| 1 MiB | 13.0% | FAIL |
| **4 MiB (current default, set by #22876)** | **7.2%** | **FAIL** |
| 16 MiB | 1.47% | FAIL |
| 64 MiB | <0.01% | PASS |

**How**

Bump default from `4096` (4 MiB) to `131072` (128 MiB). Gives
comfortable headroom for stress tests on slow platforms while remaining
8× smaller than the historical 1 GiB explicit override that caused
memory spikes.

**Back port request**
- [x] 202511

Refs: PR #22876 (commit 58a6f0b), PR #20580, PR #24592 (companion
202511-only cleanup).

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Xichen96

Copy link
Copy Markdown
Contributor Author

@vmittal-msft Hi Vineet, could you help approve and merge? thanks

mssonicbld added a commit that referenced this pull request May 14, 2026
…capture_and_check_packet_on_dut (#24621)

**Why**

PR #22876 cut the default `tcpdump_buffer_size` in
`capture_and_check_packet_on_dut` from 102400 KiB (100 MiB) to 4096 KiB
(4 MiB) to address the 1 GiB explicit override in
`test_dhcp_counter_stress` causing a ~2 GiB memory spike. The 4 MiB
default, however, is too small for normal stress-test workloads with
bursty packet rates on lower-perf platforms (720dt, 7215).

**Note: PR #22876 was not actually validated.**

The `How did you verify/test it?` section of #22876 only states:

> - Code passes flake8 with max-line-length=120
> - Fix matches the exact unit interpretation from tcpdump documentation

No test execution. The 4 MiB value was a unit-correction choice based on
tcpdump docs, not an empirically-tested minimum. As shown by the data
below, 4 MiB drops 7.2% of relayed DHCP packets on real hardware — i.e.
the new default was never validated against the stress-test workload
that motivated the change.

Empirical measurements on `testbed-bjw2-can-720dt-6`
(Arista-720DT-G48S4, SONiC.20251110.26) running
`test_dhcp_counter_stress[discover]` (25 pps × 48 servers × 120 s):

| Buffer | tcpdump drop rate vs dhcpmon counter | test result |
|---:|---:|---|
| 1 MiB | 13.0% | FAIL |
| **4 MiB (current default, set by #22876)** | **7.2%** | **FAIL** |
| 16 MiB | 1.47% | FAIL |
| 64 MiB | <0.01% | PASS |

**How**

Bump default from `4096` (4 MiB) to `131072` (128 MiB). Gives
comfortable headroom for stress tests on slow platforms while remaining
8× smaller than the historical 1 GiB explicit override that caused
memory spikes.

**Back port request**
- [x] 202511

Refs: PR #22876 (commit 58a6f0b), PR #20580, PR #24592 (companion
202511-only cleanup).

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vmittal-msft vmittal-msft merged commit 2919c4c into sonic-net:202511 May 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants