netdev CI testing #6666

kuba-moo · 2024-03-27T20:02:33Z

Reusable PR for hooking netdev CI to BPF testing.

I took over this patch from Vladimir Oltean. The only change from my side is the adaption of the commit message. I hope I mentioned his work correctly in the tags. New timestamping API was introduced in commit 66f7223 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the tsnep driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. The driver does not need the interface to be down in order for timestamping to be changed. Thus, the netif_running() restriction in tsnep_netdev_ioctl() is not migrated to the new API. There is no interaction with hardware registers for either operation, just a concurrency with the data path which is fine. After removing the PHY timestamping logic from tsnep_netdev_ioctl(), the rest is almost equivalent to phy_do_ioctl_running(), except for the return code on the !netif_running() condition: -EINVAL vs -ENODEV. Let's make the conversion to phy_do_ioctl_running() anyway, on the premise that a return code standardized tree-wide is less complex. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Gerhard Engleder <[email protected]> Tested-by: Gerhard Engleder <[email protected]> Signed-off-by: NipaLocal <nipa@local>

update old legacy cleanup_module from the file with __exit module as per kernel code practices and restore the #ifdef MODULE condition to allow successful compilation as a built -in driver. The file had an old cleanup_module still in use which could be updated with __exit module function although its init_module is indeed newer however the cleanup_module was still using the older version of exit. To set proper exit module function replace cleanup_module with __exit corkscrew_exit_module to align it to the kernel code consistency. Signed-off-by: Shi Hao <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The root cause of this issue are: 1. When probing the usbnet device, executing usbnet_link_change(dev, 0, 0); put the kevent work in global workqueue. However, the kevent has not yet been scheduled when the usbnet device is unregistered. Therefore, executing free_netdev() results in the "free active object (kevent)" error reported here. 2. Another factor is that when calling usbnet_disconnect()->unregister_netdev(), if the usbnet device is up, ndo_stop() is executed to cancel the kevent. However, because the device is not up, ndo_stop() is not executed. The solution to this problem is to cancel the kevent before executing free_netdev(), which also deletes the delay timer. Fixes: a69e617 ("usbnet: Fix linkwatch use-after-free on disconnect") Reported-by: Sam Sun <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=8bfd7bcc98f7300afb84 Signed-off-by: Lizhi Xu <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Add a simple tool that demonstrates adding a flower filter with two VLAN push actions. This example can be invoked as: # ./tools/samples/tc-filter-add p2 # tc -j -p filter show dev p2 ingress pref 2211 [ { "protocol": "802.1Q", "kind": "flower", "chain": 0 },{ "protocol": "802.1Q", "kind": "flower", "chain": 0, "options": { "handle": 1, "keys": { "num_of_vlans": 3, "vlan_id": 255, "vlan_prio": 5 }, "not_in_hw": true, "actions": [ { "order": 1, "kind": "vlan", "vlan_action": "push", "id": 255, "control_action": { "type": "pass" }, "index": 5, "ref": 1, "bind": 1 },{ "order": 2, "kind": "vlan", "vlan_action": "push", "id": 555, "control_action": { "type": "pass" }, "index": 6, "ref": 1, "bind": 1 } ] } } ] This shows the filter with two VLAN push actions, verifying that tc action attributes are handled correctly. Signed-off-by: Zahari Doychev <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The memory belonging to tx_buf and rx_buf in ynl_sock is not initialized after allocation. This commit ensures the entire allocated memory is set to zero. When asan is enabled, uninitialized bytes may contain poison values. This can cause failures e.g. when doing ynl_attr_put_str then poisoned bytes appear after the null terminator. As a result, tc filter addition may fail. Signed-off-by: Zahari Doychev <[email protected]> Signed-off-by: NipaLocal <nipa@local>

When freeing indexed arrays, the corresponding free function should be called for each entry of the indexed array. For example, for for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called for each entry. Previously, memory leaks were reported when enabling the ASAN analyzer. ================================================================= ==874==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 kernel-patches#1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813 kernel-patches#2 0x55c98db048af in main ./linux/tools/net/ynl/samples/tc-filter-add.c:71 Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 kernel-patches#1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813 kernel-patches#2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74 Direct leak of 10 byte(s) in 2 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 kernel-patches#1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622 SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s). The following diff illustrates the changes introduced compared to the previous version of the code. void tc_flower_attrs_free(struct tc_flower_attrs *obj) { + unsigned int i; + free(obj->indev); + for (i = 0; i < obj->_count.act; i++) + tc_act_attrs_free(&obj->act[i]); free(obj->act); free(obj->key_eth_dst); free(obj->key_eth_dst_mask); Signed-off-by: Zahari Doychev <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The Linux tc actions expect that the action order starts from index one. To accommodate this, add a start-index property to the ynl spec for indexed arrays. This property allows the starting index to be specified, ensuring compatibility with consumers that require a non-zero-based index. For example if we have "start_index = 1" then we get the following diff. ynl_attr_put_str(nlh, TCA_FLOWER_INDEV, obj->indev); array = ynl_attr_nest_start(nlh, TCA_FLOWER_ACT); for (i = 0; i < obj->_count.act; i++) - tc_act_attrs_put(nlh, i, &obj->act[i]); + tc_act_attrs_put(nlh, i + 1, &obj->act[i]); ynl_attr_nest_end(nlh, array); Signed-off-by: Zahari Doychev <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Simplify the code by using phy_find_first(). Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: NipaLocal <nipa@local>

When setting the extended skb data for sadb_x_ipsecrequest, the requested extended data size exceeds the allocated skb data length, triggering the reported bug. Because family only supports AF_INET and AF_INET6, other values will cause pfkey_sockaddr_fill() to fail, which in turn causes set_ipsecrequest() to fail. Therefore, a workaround is available here: using a family value of 0 to resolve the issue of excessively large extended data length. syzbot reported: kernel BUG at net/core/skbuff.c:212! Call Trace: skb_over_panic net/core/skbuff.c:217 [inline] skb_put+0x159/0x210 net/core/skbuff.c:2583 skb_put_zero include/linux/skbuff.h:2788 [inline] set_ipsecrequest+0x73/0x680 net/key/af_key.c:3532 Fixes: 08de61b ("[PFKEYV2]: Extension for dynamic update of endpoint address(es)") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=be97dd4da14ae88b6ba4 Signed-off-by: Edward Adam Davis <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Replace `dev_kfree_skb()` with `dev_kfree_skb_any()` in `start_xmit()` which can be called from netpoll (hard IRQ) and from other contexts. Also, `np->link_status` can be changed at any time by interrupt handler. <idle>-0 [011] dNh4. 4541.754603: start_xmit <-netpoll_start_xmit <idle>-0 [011] dNh4. 4541.754622: <stack trace> => [FTRACE TRAMPOLINE] => start_xmit => netpoll_start_xmit => netpoll_send_skb => write_msg => console_flush_all => console_unlock => vprintk_emit => _printk => rio_interrupt => __handle_irq_event_percpu => handle_irq_event => handle_fasteoi_irq => __common_interrupt => common_interrupt => asm_common_interrupt => mwait_idle => default_idle_call => do_idle => cpu_startup_entry => start_secondary => common_startup_64 This issue can occur when the link state changes from off to on (e.g., plugging or unplugging the LAN cable) while transmitting a packet. If the skb has a destructor, a warning message may be printed in this situation. -> consume_skb (dev_kfree_skb()) -> __kfree_skb() -> skb_release_all() -> skb_release_head_state(skb) if (skb->destructor) { DEBUG_NET_WARN_ON_ONCE(in_hardirq()); skb->destructor(skb); } Found by inspection. Signed-off-by: Yeounsu Moon <[email protected]> Fixes: 1da177e ("Linux-2.6.12-rc2") Tested-on: D-Link DGE-550T Rev-A3 Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The function devlink_port_region_get_by_name() incorrectly uses region->ops->name to compare the region name. as it is not any critical imapce as ops and port_pos define as union for devlink_region but as per code logica it should refer port_ops here. no functional impact as ops and port_ops are part of same union. Update it to use region->port_ops->name to properly reference the name of the devlink port region. Signed-off-by: Alok Tiwari <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The limit of the small queue check is calculated from the pacing rate, the pacing rate is calculated from the cwnd. If the cwnd is small, the small queue check may fail. When the samll queue check fails, the tcp layer will send less packages, then the tcp_is_cwnd_limited would alreays return false, then the cwnd would have no chance to get updated. The cwnd has no chance to get updated, it keeps small, then the pacing rate keeps small, and the limit of the small queue check keeps small, then the small queue check would always fail. It is a kind of dead lock, when a tcp flow comes into this situation, it's throughput would be very small, obviously less then the correct throughput it should have. We set is_cwnd_limited to true when the small queue check fails, then the cwnd would have a chance to get updated, then we can break this deadlock. Below ss output shows this issue: skmem:(r0,rb131072, t7712, <------------------------------ wmem_alloc = 7712 tb243712,f2128,w219056,o0,bl0,d0) ts sack cubic wscale:7,10 rto:224 rtt:23.364/0.019 ato:40 mss:1448 pmtu:8500 rcvmss:536 advmss:8448 cwnd:28 <------------------------------ cwnd=28 bytes_sent:2166208 bytes_acked:2148832 bytes_received:37 segs_out:1497 segs_in:751 data_segs_out:1496 data_segs_in:1 send 13882554bps lastsnd:7 lastrcv:2992 lastack:7 pacing_rate 27764216bps <--------------------- pacing_rate=27764216bps delivery_rate 5786688bps delivered:1485 busy:2991ms unacked:12 rcv_space:57088 rcv_ssthresh:57088 notsent:188240 minrtt:23.319 snd_wnd:57088 limit=(27764216 / 8) / 1024 = 3389 < 7712 So the samll queue check fails. When it happens, the throughput is obviously less than the normal situation. By setting the tcp_is_cwnd_limited to true when the small queue check failed, we can avoid this issue, the cwnd could increase to a reasonalbe size, in my test environment, it is about 4000. Then the small queue check won't fail. Signed-off-by: Peng Yu <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Add supported_extts_flags and supported_perout_flags configuration to make the driver complaint with the latest API. Initialize channel information to 0 to avoid confusing users, because HW doesn't actually care about channels. Signed-off-by: Vadim Fedorenko <[email protected]> Signed-off-by: NipaLocal <nipa@local>

./drivers/net/ethernet/cadence/macb_main.c: linux/inetdevice.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26474 Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Alex will send phylink patches soon which will make us link up on QEMU again, but for now let's hack up the link. Gives us a chance to add another QEMU NIC test to "HW" runners in the CI. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

… HEAD

kuba-moo force-pushed the to-test branch from 6bd5e75 to bdd05e2 Compare March 27, 2024 21:49

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46

kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14

kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 9325308 to 7940ae1 Compare March 29, 2024 18:12

kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 7940ae1 to 8f1ff3c Compare March 30, 2024 00:21

kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00

vladimiroltean and others added 29 commits October 19, 2025 20:00

net: stmmac: mdio: use phy_find_first to simplify stmmac_mdio_register

4d3a6f8

Simplify the code by using phy_find_first(). Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: disable random kunit tests

a9a2c8a

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: disable 6.17's merge window kunit tests

bf2eb6f

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: x86: use periodic HZ tick

03915ee

Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: profile (time) test output

37d25a4

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: timestamp - try waking

d0f3260

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: dbg: tests: bonding: print info on failure

8537261

Signed-off-by: NipaLocal <nipa@local>

nipa: selftests: net: enable profiling

eac4da1

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: tc_action dbg

29213f1

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: disable CPU_MITIGATIONS

d96b1d5

These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: forwarding: set timeout to 3 hours

d5b1a16

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

nipa: drv: net: add timeout

943a9a9

Signed-off-by: NipaLocal <nipa@local>

nipa: config: x86: disable GPUs and sound

1f4a748

We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

nipa: config: disable kmemleak auto scan

08eef4e

kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Merge remote-tracking branch 'origin/net-next-2025-10-20--03-00' into…

f8eb129

… HEAD

kuba-moo force-pushed the to-test branch from ccd58b4 to f8eb129 Compare October 20, 2025 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

netdev CI testing #6666

netdev CI testing #6666

kuba-moo commented Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

46 participants

netdev CI testing #6666

Are you sure you want to change the base?

netdev CI testing #6666

Conversation

kuba-moo commented Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

46 participants