Add cmd timeout, error detection and logging to wireguard re-resolver script#10156
Add cmd timeout, error detection and logging to wireguard re-resolver script#10156deajan wants to merge 1 commit intoopnsense:masterfrom
Conversation
Refactor DNS re-resolution script to improve logging and command execution, and make it portable for use in other projects (tested in OPNSense 26.1 and RHEL9 so far)
|
@deajan just try to keep it simple, all processes can emit to syslog, if messages are missing, I have no objections to add them, but do have a desire to keep the script as simple as possible. Starting with a ticket describing the problem that is aimed to be solved is always a good starting point. |
|
@AdSchellevis I see your point, but since I'm trying to resolve the same issues on OPNsense that I encounter on my AlmaLinux setups when using Wireguard, I wanted to make a "can use it everywhere" solution based on your work. The result is working on both platforms currently. Do you suggest me to remove the logfile support so logs are catched from stdout by the caller (eg cron) ? Apart from the logging, I wanted to make the script "error proof(TM)", meaning that I want at least to catch any possible error and log it, and of course have a one minute timemout enforced so execution cannot be frozen by some unresponsive shell command. |
|
@deajan I'm ok with catching and logging errors, certainly, just want to make sure changes are small and focused |
|
@AdSchellevis Sure, so what do you want me to remove / rework ? i can make the logging part optional, but it's convenient to log stuff and use the same script when running on different platforms. |
|
I'm just asking for a minimal change to my script using syslog as output, looking at the request, it feels that this is about 5 to 10 lines of code, which is currently not the case. So, keep it simple, starting with the goals we try to achieve. |
Refactor DNS re-resolution script to improve logging and command execution, make it generally more reliable, add proper exit code, and make it portable for use in other projects (tested in OPNSense 26.1 and RHEL9 so far)
Important notices
Before you submit a pull request, we ask you kindly to acknowledge the following:
Describe the problem
I'm currently investigating issues with wireguard where after a WAN drop, wireguard never reconnects (handshake becomes stale) unless I restart the service on local or remote OPNsense side.
Describe the proposed solution
This PR adds stderr capture to the commands, adds a timeout in case a subprocess command gets stuck, and adds generic (rotated) logging for optional debugging purposes, catches all possible errors and logs them.
It is a first step into diagonsis of what actually happens with wireguard not reconnecting.
@AdSchellevis Please let me know if I'm out of line with this.
I would also like to add an optional "restart service" action when handshake is stale and updating resolved FQDN don't fix the issue, but that's too broad for a diagnostic right now.