Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cansequence tool shows wrong telegram order #166

Open
mawekman opened this issue May 10, 2023 · 19 comments
Open

cansequence tool shows wrong telegram order #166

mawekman opened this issue May 10, 2023 · 19 comments

Comments

@mawekman
Copy link

Hi all,
First of all, this is a great project. We bought two candlelight usb-can sticks from linux automation as we might plan to use this interface on our products.
In our tests we now have a phenomenon that we cannot explain. We use the cansequence tool to create load on the canbus. With a baudrate of 1MBit and a load between 60% and 95%, we notice again and again that telegrams are received in the wrong order. Another non-candlelight-based cancontroller receives the telegrams in the correct order at the same moment, so I assume that the CAN telegrams are in the correct order on the bus. It looks like this:

grafik
Telegrams are not lost but in the wrong order.

We are using a raspberry cm4 based system (cm4 io board).
grafik

Has anyone ever seen something similar or have an idea what's going wrong ?

Thanks in advance !

@marckleinebudde
Copy link
Contributor

Hey @mawekman,

can you try my latest firmware: https://github.com/marckleinebudde/candleLight_fw/tree/multichannel

@marckleinebudde
Copy link
Contributor

Can you connect the candlelight without an USB-hub?

@mawekman
Copy link
Author

Hey @marckleinebudde,

if have compiled and flashed your firmware but the behaviour ist the same.
grafik

Connecting the candlelight to an port not routed over an USB-hub is not possible on pi4. The pi4 boards use an pcie USB controller to provide USB Interfaces. We have tested a VIA VL805 Chip pcie card (this is the one on the standard pi4 board) on the cm4-io-board and a pcie card with a Renesas upd720201 chip. The behaviour of both cards is the same.
The internal dwc2 USB Controller on pi4 also depends on a usb-hub and cannot be used in our design.

@marckleinebudde
Copy link
Contributor

I think the RX side on the µC hardware is configured for FIFO, so there shouldn't be any re-orderings on the hardware side.

Is your kernel new enough to support HW timestamps? What does candump any,0:0,#FFFFFFFF -cexdtA -H output? If HW timestamps are supported please record them via candump any,0:0,#FFFFFFFF -cexdtA -H | tee /tmp/log. Let's see if the kernel stack re-orders them.

@mawekman
Copy link
Author

Hi @marckleinebudde ,

I'm working on compiling a new kernel with timestamping support in the gs_usb driver.

I just noticed that I translated the wrong candlelight firmware branch. So I will repeat uploading a new firmware. But when I compile the correct multichannel branch I get following error:

grafik

@marckleinebudde
Copy link
Contributor

It should compile now - I've force pushed the multichannel branch.

@mawekman
Copy link
Author

I'm sorry, I've updated the branch but it's still the same issue.

@marckleinebudde
Copy link
Contributor

Please show me the output of:

git status
git rev-parse @
rm -rf build
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/gcc-arm-none-eabi-8-2019-q3-update.cmake
make

@mawekman
Copy link
Author

`root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# git status
On branch multichannel
Your branch is up to date with 'origin/multichannel'.

nothing to commit, working tree clean
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# git rev-parse @
78def66
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# rm -rf build
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# mkdir build
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# cd build/
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/gcc-arm-none-eabi-8-2019-q3-update.cmake
-- The C compiler identification is GNU 8.3.1
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/arm-none-eabi-gcc
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is GNU 8.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Detecting CXX compile features
-- Detecting CXX compile features - done


You may now:
-compile all targets ('make')
-compile a single target (e.g. 'make cantact_fw'
-flash a device (e.g. 'make flash-cantact_fw'
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# make
Scanning dependencies of target STM32_HAL_STM32F042x6
[ 0%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal.c.obj
[ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_can.c.obj
[ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_cortex.c.obj
[ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_flash_ex.c.obj
[ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_flash.c.obj
[ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_gpio.c.obj
[ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_pcd_ex.c.obj
[ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_pcd.c.obj
[ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_rcc.c.obj
[ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_rcc_ex.c.obj
[ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_tim_ex.c.obj
[ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_tim.c.obj
[ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_ll_usb.c.obj
[ 5%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/cmsis/system_stm32f0xx.c.obj
[ 5%] Linking C static library libSTM32_HAL_STM32F042x6.a
[ 5%] Built target STM32_HAL_STM32F042x6
Scanning dependencies of target STM32_USB_Device_Library_STM32F042x6
[ 5%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_ctlreq.c.obj
[ 6%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_ioreq.c.obj
[ 6%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_core.c.obj
[ 6%] Linking C static library libSTM32_USB_Device_Library_STM32F042x6.a
[ 6%] Built target STM32_USB_Device_Library_STM32F042x6
Scanning dependencies of target version_h
-- Found Git: /usr/bin/git (found version "2.30.2")
[ 6%] Built target version_h
Scanning dependencies of target usb2can_fw
[ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_desc.c.obj
[ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_gs_can.c.obj
[ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_conf.c.obj
[ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/can_common.c.obj
[ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/dfu.c.obj
[ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/gpio.c.obj
[ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/led.c.obj
[ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/timer.c.obj
[ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/util.c.obj
[ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/startup.c.obj
[ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/main.c.obj
[ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/interrupts.c.obj
[ 11%] Building C object CMakeFiles/usb2can_fw.dir/src/boards/legacy.c.obj
/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/src/boards/legacy.c:188:2: error: missing initializer for field 'leds' of 'struct BoardChannelConfig' [-Werror=missing-field-initializers]
.channels[0].leds = {
^
In file included from /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/src/boards/legacy.c:29:
/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/include/board.h:44:19: note: 'leds' declared here
struct LEDConfig leds[LED_MAX];
^~~~
cc1: all warnings being treated as errors
make[2]: *** [CMakeFiles/usb2can_fw.dir/build.make:238: CMakeFiles/usb2can_fw.dir/src/boards/legacy.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:191: CMakeFiles/usb2can_fw.dir/all] Error 2
make: *** [Makefile:103: all] Error 2
root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# `

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 11, 2023

Ok, you're indeed building the correct branch. Maybe a different compiler version. Let's try:

make VERBOSE=1 usb2can_fw

Then we see the used compiler, here it's /usr/lib/ccache/arm-none-eabi-gcc. Let's see it's version:

/usr/lib/ccache/arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:12.2.rel1-1) 12.2.1 20221205
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's from current Debian testing.

@mawekman
Copy link
Author

It seems my compiler ist quit old.

root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# /usr/bin/arm-none-eabi-gcc --version arm-none-eabi-gcc (15:8-2019-q3-1+b1) 8.3.1 20190703 (release) [gcc-8-branch revision 273027] Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So I will first udate the gcc and try it again.

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 11, 2023

Is that current Debian stable aka Bullseye aka Debian 11?

I can reproduce the issue with the old compiler.

Update: I have a fix that works with the old compiler.

@marckleinebudde
Copy link
Contributor

I've force pushed my branch. Now it works even with the older compiler.

@mawekman
Copy link
Author

Hi @marckleinebudde,

thanks a lot for your help.
I've made some additional tests and it seems to be a problem depending of the RT-kernel. Heavy load on the cpu's shows the problem more often and faster. Same tests with a non RT kernel do not show the problems at the moment.
Do you know if anyone has ever done tests with an RT kernel ?

@marckleinebudde
Copy link
Contributor

I don't think anyone has tested USB-CAN devices under RT configurations. As I speculated earlier #166 (comment), I think the networking stack might reorder the CAN frames. I think this problem is best discussed on the [email protected] mailing list.

@marckleinebudde
Copy link
Contributor

For the reference:
A patch series has been posted on the [email protected] mailing list: https://lore.kernel.org/all/[email protected]

@fenugrec
Copy link
Collaborator

fenugrec commented Jul 6, 2023

For the reference: A patch series has been posted on the [email protected]

oh wow good work tracking that down. How are other (non-CAN) usb network adapters handling this ?

@marckleinebudde
Copy link
Contributor

I should add the solution to my mail, here's the problem description:

Traditionally USB drivers used to pass the received CAN frames/skbs to
the network stack with netif_rx(). In netif_rx() the skbs are queued
to the local CPU. If IRQs are handled in round robin, OoO packets may
occur.

The solution to the problem is to use NAPI and then netif_receive_skb() to push the skb into the networking stack. netif_receive_skb() properly serializes the incoming skbs.

All networking drivers not using NAPI have this problem. Someone tested a non-NAPI USB Ethernet adapter and saw OoO TCP packages in wireshark. TCP tries to fix this, but IIRC this resulted in retransmissions and then duplicate TCP packets. At least this was not beneficial for TCP performance. Don't know what this does to a realtime UDP audio stream 😢.

Some newer USB-Ethernet drivers like the lan78xx use NAPI.

@marckleinebudde
Copy link
Contributor

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jul 28, 2023
The driver used to pass received CAN frames/skbs to the network stack
with netif_rx(). In netif_rx() the skbs are queued to the local CPU.
If IRQs are handled in round robin, OoO packets may occur.

To avoid out-of-order reception convert the driver from netif_rx() to
NAPI.

For USB devices with timestamping support use the rx-offload helper
can_rx_offload_queue_timestamp() for the RX, and
can_rx_offload_get_echo_skb_queue_timestamp() for the TX path. Devices
without timestamping support use can_rx_offload_queue_tail() for RX,
and can_rx_offload_get_echo_skb_queue_tail() for the TX path.

Link: https://lore.kernel.org/all/[email protected]
Link: candle-usb/candleLight_fw#166
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants