-
Notifications
You must be signed in to change notification settings - Fork 126
netdev CI testing #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kuba-moo
wants to merge
986
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
linux-netdev:to-test
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
netdev CI testing #6666
+38,062
−27,401
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f22ee0
to
8a9a8e0
Compare
64c403f
to
8da1f58
Compare
78ebb17
to
9325308
Compare
c8c7b2f
to
a71aae6
Compare
9325308
to
7940ae1
Compare
d8feb00
to
b16a6b9
Compare
7940ae1
to
8f1ff3c
Compare
4164329
to
c5cecb3
Compare
Some QSFP modules have more eeprom to be read by ethtool than the initial high and low page 0 that is currently available in the DSC's ionic sprom[] buffer. Since the current sprom[] is baked into the middle of an existing API struct, to make the high end of page 1 and page 2 available a block is carved from a reserved space of the existing port_info struct and the ionic_get_module_eeprom() service is taught how to get there. Newer firmware writes the additional QSFP page info here, yet this remains backward compatible because older firmware sets this space to all 0 and older ionic drivers do not use the reserved space. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Add support for the newer get_module_eeprom_by_page interface. Only the upper half of the 256 byte page is available for reading, and the firmware puts the two sections into the extended sprom buffer, so a union is used over the extended sprom buffer to make clear which page is to be accessed. With get_module_eeprom_by_page implemented there is no need for the older get_module_info or git_module_eeprom interfaces, so remove them. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Make the CMIS module type's page 17 channel data available for ethtool to request. As done previously, carve space for this data from the port_info reserved space. In the future, if additional pages are needed, a new firmware AdminQ command will be added for accessing random pages. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The pds_core's adminq is protected by the adminq_lock, which prevents more than 1 command to be posted onto it at any one time. This makes it so the client drivers cannot simultaneously post adminq commands. However, the completions happen in a different context, which means multiple adminq commands can be posted sequentially and all waiting on completion. On the FW side, the backing adminq request queue is only 16 entries long and the retry mechanism and/or overflow/stuck prevention is lacking. This can cause the adminq to get stuck, so commands are no longer processed and completions are no longer sent by the FW. As an initial fix, prevent more than 16 outstanding adminq commands so there's no way to cause the adminq from getting stuck. This works because the backing adminq request queue will never have more than 16 pending adminq commands, so it will never overflow. This is done by reducing the adminq depth to 16. Fixes: 45d76f4 ("pds_core: set up device and adminq") Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
If the FW doesn't support the PDS_CORE_CMD_FW_CONTROL command the driver might at the least print garbage and at the worst crash when the user runs the "devlink dev info" devlink command. This happens because the stack variable fw_list is not 0 initialized which results in fw_list.num_fw_slots being a garbage value from the stack. Then the driver tries to access fw_list.fw_names[i] with i >= ARRAY_SIZE and runs off the end of the array. Fix this by initializing the fw_list and by not failing completely if the devcmd fails because other useful information is printed via devlink dev info even if the devcmd fails. Fixes: 45d76f4 ("pds_core: set up device and adminq") Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
When the pds_core driver was first created there were some race conditions around using the adminq, especially for client drivers. To reduce the possibility of a race condition there's a check against pf->state in pds_client_adminq_cmd(). This is problematic for a couple of reasons: 1. The PDSC_S_INITING_DRIVER bit is set during probe, but not cleared until after everything in probe is complete, which includes creating the auxiliary devices. For pds_fwctl this means it can't make any adminq commands until after pds_core's probe is complete even though the adminq is fully up by the time pds_fwctl's auxiliary device is created. 2. The race conditions around using the adminq have been fixed and this path is already protected against client drivers calling pds_client_adminq_cmd() if the adminq isn't ready, i.e. see pdsc_adminq_post() -> pdsc_adminq_inc_if_up(). Fix this by removing the pf->state check in pds_client_adminq_cmd() because invalid accesses to pds_core's adminq is already handled by pdsc_adminq_post()->pdsc_adminq_inc_if_up(). Fixes: 1065903 ("pds_core: add the aux client API") Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Make the wait_context a full part of the q_info struct rather than a stack variable that goes away after pdsc_adminq_post() is done so that the context is still available after the wait loop has given up. There was a case where a slow development firmware caused the adminq request to time out, but then later the FW finally finished the request and sent the interrupt. The handler tried to complete_all() the completion context that had been created on the stack in pdsc_adminq_post() but no longer existed. This caused bad pointer usage, kernel crashes, and much wailing and gnashing of teeth. Fixes: 01ba61b ("pds_core: Add adminq processing and commands") Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: NipaLocal <nipa@local>
In the current method, the MDC divider was reset to the default setting of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC divider configuration function in mtk_hw_init() after reset. Fixes: c0a4400 ("net: ethernet: mtk_eth_soc: set MDIO bus clock frequency") Signed-off-by: Bo-Cun Chen <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: NipaLocal <nipa@local>
… for 100Mbps Without this patch, the maximum weight of the queue limit will be incorrect when linked at 100Mbps due to an apparent typo. Fixes: f63959c ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Signed-off-by: Bo-Cun Chen <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The QDMA packet scheduler suffers from a performance issue. Fix this by picking up changes from MediaTek's SDK which change to use Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration. Fixes: 160d3a9 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support") Signed-off-by: Bo-Cun Chen <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Change hardware configuration for the NETSYSv3. - Enable PSE dummy page mechanism for the GDM1/2/3 - Enable PSE drop mechanism when the WDMA Rx ring full - Enable PSE no-drop mechanism for packets from the WDMA Tx - Correct PSE free drop threshold - Correct PSE CDMA high threshold Fixes: 1953f13 ("net: ethernet: mtk_eth_soc: add NETSYS_V3 version support") Signed-off-by: Bo-Cun Chen <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: NipaLocal <nipa@local>
… u64 The capabilities bitfield was converted to a 64-bit value, but a cap_bit in struct mtk_eth_muxc which is used to store a full bitfield (rather than the bit number, as the name would suggest) still holds only a 32-bit value. Change the type of cap_bit to u64 in order to avoid truncating the bitfield which results in path selection to not work with capabilities above the 32-bit limit. Fixes: 51a4df6 ("net: ethernet: mtk_eth_soc: convert caps in mtk_soc_data struct to u64") Signed-off-by: Bo-Cun Chen <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: NipaLocal <nipa@local>
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of drr, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. In addition to checking for qlen being zero, this patch checks whether the class was already added to the active_list (cl_is_initialised) before adding to the list to cover for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: NipaLocal <nipa@local>
… qdisc As described in Gerrard's report [1], we have a UAF case when an hfsc class has a netem child qdisc. The crux of the issue is that hfsc is assuming that checking for cl->qdisc->q.qlen == 0 guarantees that it hasn't inserted the class in the vttree or eltree (which is not true for the netem duplicate case). This patch checks the n_active class variable to make sure that the code won't insert the class in the vttree or eltree twice, catering for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs") Reported-by: Gerrard Tai <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: NipaLocal <nipa@local>
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of ets, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. In addition to checking for qlen being zero, this patch checks whether the class was already added to the active_list (cl_is_initialised) before doing the addition to cater for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: NipaLocal <nipa@local>
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of qfq, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. This patch checks whether the class was already added to the agg->active list (cl_is_initialised) before doing the addition to cater for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: NipaLocal <nipa@local>
…behaviour Add 4 TDC tests that exercise the reentrant enqueue behaviour in drr, ets, qfq, and hfsc: - Test DRR's enqueue reentrant behaviour with netem (which caused a double list add) - Test ETS's enqueue reentrant behaviour with netem (which caused a double list add) - Test QFQ's enqueue reentrant behaviour with netem (which caused a double list add) - Test HFSC's enqueue reentrant behaviour with netem (which caused a UAF) Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: NipaLocal <nipa@local>
With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling polling was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The netdevice usage count increases during transmit queue timeouts because netdev_hold is called in ndo_tx_timeout, scheduling a task to reinitialize the card. Although netdev_put is called at the end of the scheduled work, rtnl_unlock checks the reference count during cleanup. This could cause issues if transmit timeout is called on multiple queues. Therefore, netdev_hold and netdev_put have been removed. Fixes: cb7dd71 ("octeon_ep_vf: Add driver framework and device initialization") Signed-off-by: Sathesh B Edara <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Socfpga's DWMAC glue comes in a variety of flavours with multiple options when it comes to physical interfaces, making it not so easy to test. Having access to a Cyclone5 with RGMII as well as Lynx PCS variants, add myself as a maintainer to help with reviews and testing. Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reusable PR for hooking netdev CI to BPF testing.