option to exit on error instead of retrying #14

disconn3ct · 2022-07-23T13:11:14Z

I'm running a containerized version under k8s and having a problem after device resets. It can lose connection to the underlying serial device and hang on a repeated device-not-found error. (It still answers TCP connections, which prevents the health checks from detecting the problem.)

I think an option to simply exit on errors, instead of retrying, would add a great deal of operational flexibility without affecting existing users. (This would also make it easier to work around broken hardware; for example, I've seen cheap tty adapters that requires a usb reset between attempts. If ser2sock exits on error, a simple wrapper could detect the wedged hw and reset or even power-cycle it before restarting ser2sock.)

Unfortunately everything is working this morning so I don't have log examples, but I'll keep an eye out and add them if needed.

f34rdotcom · 2022-07-23T16:59:41Z

The eventual lock when looking for serial device to return is not good. That needs to just work. Outside of that an option to die on specific errors seems like a simple switch to add. Just inject a test for the flag and exit at specific points. It sounds like you are not using the -c switch that should keep connections out until serial returns. The purpose of this switch was to avoid the situation you describe except the lockup problem.

disconn3ct · 2022-07-24T12:44:13Z

I thought -c specifically allowed connections even if the serial is missing.
-c keep incoming connections when a serial device is disconnected

Looking at the code it looks like it does what it says: if the serial is connected OR that flag is set, continue as if the serial is connected. That is the behavior I am already seeing. It also looks like that flag is not parsed until after the accept call on #1030. That accept is all that the health check is looking for, so it passes before the flag is used.

The dumb-hardware lockup isn't specifically a ser2sock problem; it is related to the containerized environment. On a normal host you might expect the system to try to recover (eg a bus reset), so waiting makes sense. In the container, once it gets wedged it won't get fixed without restarting the container. If the health check works, that should be sufficient to cause a reset.

disconn3ct · 2022-07-28T01:21:17Z

Edit to add initial error. The first few lines are the k8s health check simply connecting and then dropping. (That happens fairly constantly without causing issues.) After that, the adapter resets (or crashes? or..?) and ser2sock transitions to error mode (accept then close) which still qualifies as healthy due to the accept():

[✔] Socket connected slot 4
[‼] Closing socket fd slot 3 errno: 0 'No error information'
[‼] Closing socket fd slot 4 errno: 0 'No error information'
[✘] Serial disconnected on write. errno: 5 'I/O error'
[✘] Error can not open com port at /dev/ttyUSB0 errno: 6 'No such device or address'
[‼] Socket refused because serial is not connected
[‼] Socket refused because serial is not connected
[✘] Error can not open com port at /dev/ttyUSB0 errno: 6 'No such device or address'

(repeating)

disconn3ct · 2022-08-07T14:18:02Z

For anyone else who hits this, turning line 572 from log_message() to error() and exiting instead of returning solves it. (It looks like that is a fatal error on startup, but not while running.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to exit on error instead of retrying #14

option to exit on error instead of retrying #14

disconn3ct commented Jul 23, 2022

f34rdotcom commented Jul 23, 2022

disconn3ct commented Jul 24, 2022

disconn3ct commented Jul 28, 2022 •

edited

Loading

disconn3ct commented Aug 7, 2022

option to exit on error instead of retrying #14

option to exit on error instead of retrying #14

Comments

disconn3ct commented Jul 23, 2022

f34rdotcom commented Jul 23, 2022

disconn3ct commented Jul 24, 2022

disconn3ct commented Jul 28, 2022 • edited Loading

disconn3ct commented Aug 7, 2022

disconn3ct commented Jul 28, 2022 •

edited

Loading