Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canable device hangs after desktop application crash (= stop without disonnecting) #165

Open
anicolle opened this issue May 2, 2023 · 25 comments
Labels
on hold needs work from submittter

Comments

@anicolle
Copy link

anicolle commented May 2, 2023

Hi all,
First, thank you for this great repository.
We are facing a weird problem with our Canable-based tools using candleLight_fw.
We have built a small desktop application using python canopen that connect to our custom product over can with a Canable.
Everything works fine except if our application is not terminated properly (killing it with linux "kill" command or stopping the vscode python debugger) thus bypassing the canopen disconnect step.
In this case the Canable seems to still be running as it is still visible on USB but no CAN messages seems to reach our device.
Canopen library debug shows : "Transfer aborted by client with code 0x05040000"
Rebooting our Canable (by disconnecting / reconnecting it from USB) fix the problem.
Any idea what could be the cause of this ? Any workaround we could use to avoid needing to disconnect / reconnect manually ?
Thanks,

@anicolle
Copy link
Author

anicolle commented May 2, 2023

Forgot to say that this happens with firmware flashed from canable website updater (commit 68df7d5 for canable v1, commit ba6b1dd for canable V2). But also with locally build firmware with latest repo commit (commit ac71cb4).

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 2, 2023

  • What's your operating system?
  • Can you share an python example to reproduce the problem?
  • Can you create and share a USB dump with wireshark?

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 2, 2023

Everything works fine except if our application is not terminated properly (killing it with linux "kill" command or stopping the vscode python debugger) thus bypassing the canopen disconnect step.
In this case the Canable seems to still be running as it is still visible on USB but no CAN messages seems to reach our device.

You mean you first start the app, kill the app, and restart the app, then it's not receiving data?

@anicolle
Copy link
Author

anicolle commented May 2, 2023

  • What's your operating system?

Linux (Ubuntu)

  • Can you share an python example to reproduce the problem?

I created a simple example (cf below) that connect, read an sdo and stop without disconnecting.
Only the first execution works, after that the canopen library raise a "No SDO response received" exception at each execution.
Disconnecting / reconnecting allow another execution to suceed.

import canopen
import usb.core

eds_path = "youredsfilepath.eds"
node_id = 1
sdo_index = 0x1018
sdo_subindex = 1
PRODUCT_ID_CANDLE_LIGHT: int = 0x606F
VENDOR_ID_CANDLE_LIGHT: int = 0x1D50

if __name__ == "__main__":
    network = canopen.Network()
    node = canopen.RemoteNode(node_id, eds_path)

    dev = usb.core.find(
        idVendor=VENDOR_ID_CANDLE_LIGHT, idProduct=PRODUCT_ID_CANDLE_LIGHT
    )

    network.connect(
        bustype="gs_usb",
        channel=dev.product,
        bus=dev.bus,
        address=dev.address,
        bitrate=1000000,
    )

    network.add_node(node)

    print(node.sdo[sdo_index][sdo_subindex].raw)
  • Can you create and share a USB dump with wireshark?

I can't but if needed, some of my colleagues have a setup that could do that

@anicolle
Copy link
Author

anicolle commented May 2, 2023

Everything works fine except if our application is not terminated properly (killing it with linux "kill" command or stopping the vscode python debugger) thus bypassing the canopen disconnect step.
In this case the Canable seems to still be running as it is still visible on USB but no CAN messages seems to reach our device.

You mean you first start the app, kill the app, and restart the app, then it's not receiving data?

Exactly !

@marckleinebudde
Copy link
Contributor

  • Can you share an python example to reproduce the problem?

I created a simple example (cf below) that connect, read an sdo and stop without disconnecting. Only the first execution works, after that the canopen library raise a "No SDO response received" exception at each execution. Disconnecting / reconnecting allow another execution to suceed.

Thanks. I don't have a canopen device here, can you create the matching other side, too?

@anicolle
Copy link
Author

anicolle commented May 2, 2023

Thanks for your help.
Sadly the python canopen package doesn't support slave mode. I am testing a fork that add this feature but it is a work in progress and I am not sure I will suceed to make it work.

@marckleinebudde
Copy link
Contributor

If you are on Linux, using the socketcan interface not the USB one might be an option....

@fenugrec
Copy link
Collaborator

fenugrec commented May 2, 2023

with socketan you can create a 'vcan0' interface, and use https://github.com/CANopenNode/CANopenLinux to create a node that listens on that interface. I only do this about once every 2 years, don't remember the finer details

@marckleinebudde
Copy link
Contributor

My working theory is there's a problem with re-init of python-can when using the gs-usb interface. So a vcan0 wouldn't help here.

@fenugrec
Copy link
Collaborator

fenugrec commented May 2, 2023

So a vcan0 wouldn't help here.

No indeed, that was just a suggestion to create a test setup without any actual canopen device

@anicolle
Copy link
Author

anicolle commented May 2, 2023

Thanks to both of you.
I tried to isolate the problem from canopen by running the script below (inspired from Canable Getting Started page)
I can run it three times without problems, but it fails at 4th attempt
image

import can
import usb.core

# Candlelight firmware on Linux
#bus = can.interface.Bus(bustype='socketcan', channel='can0', bitrate=500000)
PRODUCT_ID_CANDLE_LIGHT: int = 0x606F
VENDOR_ID_CANDLE_LIGHT: int = 0x1D50

dev = usb.core.find(
            idVendor=VENDOR_ID_CANDLE_LIGHT, idProduct=PRODUCT_ID_CANDLE_LIGHT
        )

# # Stock slcan firmware on Linux
bus = can.Bus(bustype="gs_usb",
            channel=dev.product,
            bus=dev.bus,
            address=dev.address,
            bitrate=1000000)

msg = can.Message(arbitration_id=0xc0ffee,
                  data=[0, 25, 0, 1, 3, 1, 4, 1],
                  is_extended_id=True)

try:
    bus.send(msg)
    print("Message sent on {}".format(bus.channel_info))
except can.CanError:
    print("Message NOT sent")

@marckleinebudde
Copy link
Contributor

I've setup your new test script, and it's running in a while ./bug.py; do sleep 1; done loop. And while writing this it failed, not after 4, but ~40 loops.

@fenugrec
Copy link
Collaborator

fenugrec commented May 2, 2023

I've setup your new test script, and it's running in a while ./bug.py; do sleep 1; done loop. And while writing this it failed, not after 4, but ~40 loops.

Interesting, before your comment I was thinking it looked like when you send 3 frames that fill up the hardware FIFO but there is no other device sending an ACK.

Tempted to blame python-can , but the errors you got may point to the device getting into a broken state. Thanks for confirming you are testing the latest build from master.

I would +1 on marc's request for a wireshark trace - wonder what's going on here

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 2, 2023

Every run of the test script starts with a GET DESCRIPTOR Request STRING:

Frame 2179: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface usbmon1, id 0
USB URB
    [Source: host]
    [Destination: 1.23.0]
    URB id: 0xffff9bba09d09180
    URB type: URB_SUBMIT ('S')
    URB transfer type: URB_CONTROL (0x02)
    Endpoint: 0x80, Direction: IN
    Device: 23
    URB bus id: 1
    Device setup request: relevant ('\0')
    Data: not present ('<')
    URB sec: 1683055257
    URB usec: 690974
    URB status: Operation now in progress (-EINPROGRESS) (-115)
    URB length [bytes]: 254
    Data length [bytes]: 0
    [Response in: 2180]
    Interval: 0
    Start frame: 0
    Copy of Transfer Flags: 0x00000200, Dir IN
    Number of ISO descriptors: 0
Setup Data
    bmRequestType: 0x80
    bRequest: GET DESCRIPTOR (6)
    Descriptor Index: 0x00
    bDescriptorType: STRING (0x03)
    Language Id: no language specified (0x0000)
    wLength: 254

and is answered with a GET DESCRIPTOR Response STRING within ~1ms:

Frame 2180: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) on interface usbmon1, id 0
USB URB
    [Source: 1.23.0]
    [Destination: host]
    URB id: 0xffff9bba09d09180
    URB type: URB_COMPLETE ('C')
    URB transfer type: URB_CONTROL (0x02)
    Endpoint: 0x80, Direction: IN
    Device: 23
    URB bus id: 1
    Device setup request: not relevant ('-')
    Data: present ('\0')
    URB sec: 1683055257
    URB usec: 691097
    URB status: Success (0)
    URB length [bytes]: 4
    Data length [bytes]: 4
    [Request in: 2179]
    [Time from request: 0.000123000 seconds]
    Unused Setup Header
    Interval: 0
    Start frame: 0
    Copy of Transfer Flags: 0x00000200, Dir IN
    Number of ISO descriptors: 0
STRING DESCRIPTOR
    bLength: 4
    bDescriptorType: 0x03 (STRING)
    wLANGID: English (United States) (0x0409)

in the bad case, the response times out after ~1s:

Frame 2214: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface usbmon1, id 0
USB URB
    [Source: 1.23.0]
    [Destination: host]
    URB id: 0xffff9bb90d1789c0
    URB type: URB_COMPLETE ('C')
    URB transfer type: URB_CONTROL (0x02)
    Endpoint: 0x80, Direction: IN
    Device: 23
    URB bus id: 1
    Device setup request: not relevant ('-')
    Data: present ('\0')
    URB sec: 1683055262
    URB usec: 253151
    URB status: No such file or directory (-ENOENT) (-2)
    URB length [bytes]: 0
    Data length [bytes]: 0
    [Request in: 2213]
    [Time from request: 1.000194000 seconds]
    Unused Setup Header
    Interval: 0
    Start frame: 0
    Copy of Transfer Flags: 0x00000200, Dir IN
    Number of ISO descriptors: 0

Here's a wireshark capture. I'm not sure who triggers the first 3 request/response cycles, it may be wireshark. In the beginning I played with different sleeps, after 34s the test runs with a 3s sleep between the runs. The capture ends with the timed out GET DESCRIPTOR Response STRING.

python-bad.zip

@marckleinebudde
Copy link
Contributor

Here a cleaner capture, again with 3s sleep between the test program calls. The candlelight is running current master firmware. It fails with the same problem.

python-bad-mainline.zip

@anicolle
Copy link
Author

anicolle commented May 3, 2023

I did also notice that running lsusb -v in "working" (left ) and "failed"(right) states doesn't return the same results.

image

@marckleinebudde
Copy link
Contributor

I can reproduce the lsusb problem here, too.

@fenugrec
Copy link
Collaborator

fenugrec commented May 5, 2023

dumb question, what dependency am I missing ?

can.exceptions.CanInterfaceNotImplementedError: Cannot import module can.interfaces.gs_usb for CAN interface 'gs_usb': No module named 'gs_usb

Switching instead to the commented line bus = can.interface.Bus(bustype='socketcan', channel='can0', bitrate=1000000) works with no issues, so whatever python module provides 'gs_usb' is doing something unusual

@marckleinebudde
Copy link
Contributor

pip install gs_usb

@marckleinebudde
Copy link
Contributor

marckleinebudde commented May 7, 2023

Switching instead to the commented line bus = can.interface.Bus(bustype='socketcan', channel='can0', bitrate=1000000) works with no issues

ACK - I suggested to use socketcan as a workaround unter Linux, too: #165 (comment)

so whatever python module provides 'gs_usb' is doing something unusual

The gs_usb user space driver is requesting GET DESCRIPTOR Request STRING on every startup, so maybe some resource exhaustion on the firmware side?

@fenugrec
Copy link
Collaborator

fenugrec commented May 7, 2023

gs_usb

thanks. I had only looked for it in my distro's packages.

so maybe some resource exhaustion on the firmware side?

Skimming throught the ST USB stack, I don't see how that could be possible ? there's no dynamic allocation, and the string descriptor requests are handled pretty much like all other control requests. Their handling of the "len" argument is a bit dubious at times (casting to/from *uint16) but I imagine we would be seeing more problems if it was flawed.

@cvaliere
Copy link

hi guys, did you crack the case eventually? :)

@marckleinebudde
Copy link
Contributor

No, I currently don't have time to chase that bug.

@fenugrec
Copy link
Collaborator

No, I currently don't have time to chase that bug.

Same here unfortunately.

the next steps would be to hook up a debugger, reproduce the problem, then examine the device state (USB periph registers, etc), and possibly set breakpoints on the get_descriptor functions.

@fenugrec fenugrec added the on hold needs work from submittter label Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on hold needs work from submittter
Projects
None yet
Development

No branches or pull requests

4 participants