-
-
Notifications
You must be signed in to change notification settings - Fork 403
failover.c - UPS Failover Driver #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+3,009
−28
Merged
Changes from 10 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
8dae3b5
drivers/, docs/: introduce failover driver
sebastiankuttnig 5855e6e
server/netget.c: rewrite upstream prefix for proxying drivers
sebastiankuttnig fe08563
NEWS.adoc: introduce failover driver
sebastiankuttnig bf3b9e2
drivers/failover.c, NEWS.adoc: fixes for compiler warnings, add PR nu…
sebastiankuttnig 961fbc2
drivers/failover.c: add shutdown non-handling
sebastiankuttnig 6b33157
drivers/failover.c: free parse_port_argument() tmp on premature exit …
sebastiankuttnig e5f3631
drivers/failover.c: clean dstate after fsdmode 0, remove now freed va…
sebastiankuttnig 2fac5e1
drivers/failover.c: preserve which port value failed the argument par…
sebastiankuttnig f4d8ab7
drivers/failover.c: make defensive freeing consistent throughout the …
sebastiankuttnig 2c74214
drivers/failover.c: reword shutdown to be more clear
sebastiankuttnig 1ad9b26
drivers/failover.c: use NUT_STRARG helper for null checks in various …
sebastiankuttnig 683559c
drivers/failover.c: use enum for priorities
sebastiankuttnig fb3d251
drivers/failover.{c,h}: introduce failover.h for defines, typedefs
sebastiankuttnig 18ac174
drivers/failover.c: do not fatalx on no connectable drivers, keep try…
sebastiankuttnig be52d0b
drivers/failover.c: remove redundant _init() calls for status/alarm
sebastiankuttnig f3b1541
drivers/failover.c: show truncation content at end of log message
sebastiankuttnig 53359b7
drivers/failover.c: safeguard ups_promote_primary against NULL or dou…
sebastiankuttnig 9f78e4c
docs/man/failover.txt: polish documentation and add rationale
sebastiankuttnig 022342c
drivers/failover.c: remove progname from non fatal log message
sebastiankuttnig a6fda90
docs/man/failover.txt: make hyphens consistent
sebastiankuttnig 22c8465
docs/man/failover.txt: add note to factor in network or lock-picking …
sebastiankuttnig 44cabf3
docs/man/failover.txt: add limitations
sebastiankuttnig b799a9d
Merge branch 'master' into failover
jimklimov 5321a55
docs/man/failover.txt: add 3rd party tool use case for rationale
sebastiankuttnig 953d368
drivers/Makefile.am: add failover.h for dists
sebastiankuttnig 12e7da4
docs/man/failover.txt: fix incompatible characters
sebastiankuttnig 825edd2
drivers/failover.c: safer string to numeric conversions, improved arg…
sebastiankuttnig 2c68099
drivers/failover.c: remove magic -1 from str_arg_to_int(), use INT_MI…
sebastiankuttnig 95ea397
scripts/upsdrvsvcctl/nut-driver-enumerator.sh.in: add support for "dr…
jimklimov f001318
Merge branch 'failover' of github.com:sebastiankuttnig/nut into failover
sebastiankuttnig 3625b15
drivers/failover.c: make csv_arg_to_array() more reusable
sebastiankuttnig 36204e8
scripts/upsdrvsvcctl/nut-driver-enumerator.sh.in: report if other dev…
jimklimov 962bac4
drivers/failover.c: use str_to_int() also in instcmd()
sebastiankuttnig 505853f
Merge branch 'master' into failover
sebastiankuttnig 784ce12
drivers/failover.{c,h}, docs/man/failover.txt: use _sockfn() for one-…
sebastiankuttnig ff08659
tests/nut-driver-enumerator-test.sh: reflect recent enumerator change…
sebastiankuttnig 54c604b
NEWS.adoc: mention NDE change to track inter-driver dependency [#2962]
jimklimov 89a0db0
docs/man/nut-driver-enumerator.txt: update intro, mention driver-on-d…
jimklimov ccaac91
drivers/failover.{c,h}: introduce checkruntime argument
sebastiankuttnig c498dd3
docs/man/failover.txt, docs/nut.dict: introduce checkruntime argument
sebastiankuttnig d8f171c
drivers/failover.c: minor improvements to order and debug levels
sebastiankuttnig 9a6f56c
drivers/failover.{c,h}: store runtimes in UPS struct
sebastiankuttnig 0937f25
drivers/failover.c: improve guarding of ups->status against NULL dere…
sebastiankuttnig fb3cac7
drivers/failover.{c,h}: make UPS priorities more readable in code
sebastiankuttnig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,229 @@ | ||
| FAILOVER(8) | ||
| ========== | ||
|
|
||
| NAME | ||
| ---- | ||
|
|
||
| failover - UPS Failover Driver | ||
|
|
||
| SYNOPSIS | ||
| -------- | ||
|
|
||
| *failover* -h | ||
|
|
||
| *failover* -a 'UPS_NAME' ['OPTIONS'] | ||
|
|
||
| NOTE: This man page only documents the specific features of the failover driver. | ||
| For information about the core driver, see linkman:nutupsdrv[8]. | ||
|
|
||
| DESCRIPTION | ||
| ----------- | ||
|
|
||
| The `failover` driver acts as a smart proxy for multiple "real" UPS drivers. It | ||
| connects to and monitors these underlying UPS drivers through their local UNIX | ||
| sockets (or Windows named pipes), continuously evaluating health and suitability | ||
| for "primary" duty according to a set of user configurable rules and priorities. | ||
|
|
||
| At any given time, `failover` designates one UPS driver as the *primary*, and | ||
| presents its commands, variables and status to the outside world as if it were | ||
| directly talking to that UPS. From the perspective of the clients (such as | ||
| linkman:upsmon[8] or linkman:upsc[8]), the `failover` driver behaves like any | ||
| single UPS, abstracting away the underlying redundancy, and allowing for | ||
| seamless transitioning between all monitored UPS drivers and their datasets. | ||
|
|
||
| The driver dynamically promotes or demotes the primary UPS driver based on: | ||
|
|
||
| - Socket availability and communication status | ||
| - Data freshness and UPS online/offline indicators | ||
| - User-defined status filters (e.g., presence or absence of `OL`, `LB`, ...) | ||
| - Administrative override via control commands (`force.primary`, `force.ignore`) | ||
|
|
||
| If the current primary becomes unavailable or no longer meets the criteria, the | ||
| driver automatically fails over to a more suitable driver. During transitions, | ||
| it ensures that any data is switched out instantly, without the linkman:upsd[8] | ||
| considering it as stale or the clients acting on any previously degraded status. | ||
|
|
||
| When no suitable primary is available, a configurable fallback state is entered: | ||
|
|
||
| - Keep last primary and declare the data as stale | ||
| - Raise `ALARM` and declare the data as stale | ||
| - Raise `ALARM` and set forced shutdown (`FSD`) | ||
|
|
||
| How the UPS are connected (be it corded, networked, ...) to the machine does not | ||
| matter, `failover` is also not reliant on linkman:upsd[8] itself running. In | ||
| principle, it could even be used on multiple drivers connected to the same UPS, | ||
| but do note that any missing data would not be multiplexed between the drivers. | ||
sebastiankuttnig marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| In summary, `failover` simplifies multi UPS driver setups by consolidating | ||
| monitoring and control into a single NUT-visible "device", reducing complexity | ||
| and ensuring seamless transitions in high-availability environments. | ||
|
|
||
| EXTRA ARGUMENTS | ||
| --------------- | ||
|
|
||
| This driver supports the following settings: | ||
|
|
||
| *port*='drivername-devicename,drivername2-devicename2,...':: | ||
| Required. Specifies the local socket names (or Windows named pipes) of the | ||
| underlying UPS drivers to be tracked. Entries must follow the format | ||
| `drivername-devicename`, as used by NUT's internal socket naming convention | ||
| (e.g. `usbhid-ups-ups1`). Multiple entries are comma-separated with no spaces. | ||
|
|
||
| *inittime*='seconds':: | ||
| Optional. Sets a grace period after driver startup during which the absence of a | ||
| primary UPS is tolerated. This allows time for underlying drivers to initialize. | ||
| Defaults to 30 seconds. | ||
sebastiankuttnig marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| *deadtime*='seconds':: | ||
| Optional. Sets a grace period in seconds after which a non-responsive UPS driver | ||
| is considered dead. Defaults to 30 seconds. | ||
|
|
||
| *relogtime*='seconds':: | ||
| Optional. Time interval in which repeated connection failure logs are emitted | ||
| for a UPS, reducing log spam during unstable conditions. Defaults to 5 seconds. | ||
|
|
||
| *noprimarytime*='seconds':: | ||
| Optional. Duration to wait without a suitable primary UPS driver before entering | ||
| the configured fallback mode (`fsdmode`). Defaults to 15 seconds. | ||
|
|
||
| *maxconnfails*='count':: | ||
| Optional. Number of consecutive connection failures allowed per UPS driver | ||
| before entering into the cooldown period (`coolofftime`). Defaults to 5. | ||
|
|
||
| *coolofftime*='seconds':: | ||
| Optional. Cooldown period during which the driver pauses reconnect attempts | ||
| after exceeding `maxconnfails`. Defaults to 15 seconds. | ||
|
|
||
| *fsdmode*='0|1|2':: | ||
| Optional. Defines the behavior when no suitable primary UPS driver is found | ||
| after `noprimarytime` has elapsed. Defaults to 0. | ||
|
|
||
| - `0`: *Do not demote the last primary, but mark its data stale.* This is | ||
| similar to how a regular UPS driver would behave when it loses its connection to | ||
| the target UPS device. linkman:upsmon[8] will act on the last known (online or | ||
| not) status, and decide itself whether that UPS should be considered critical. | ||
|
|
||
| - `1`: *Demote the primary, raise `ALARM` and mark the data stale after an | ||
| additional few seconds have elapsed (ensuring full propagation).* This will | ||
| force monitoring linkman:upsmon[8] to see a previously in an alarm state device | ||
sebastiankuttnig marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| having lost its connection and consider the UPS driver critical, possibly | ||
| resulting in forced shutdown (`FSD`) by depletion of `MINSUPPLIES`. | ||
|
|
||
| - `2`: *Demote the primary, raise `ALARM` and set immediate `FSD`.* This will | ||
| set `FSD` from the driver side and omit linkman:upsmon[8] to raise it itself. | ||
| This mode is for setups where immediate shutdown is warranted, regardless of | ||
| anything else, and getting `FSD` out to the clients as fast as just possible. | ||
|
|
||
| *strictfiltering*='0|1':: Optional. If set to 1, only UPS matching the | ||
| configured status filters are considered for promotion to primary. If set to 0, | ||
| the hard-coded default logic is also considered when no status filters match | ||
| (read more about this further down). Defaults to 0. | ||
|
|
||
| *status_have_any*='OL,CHRG,...':: | ||
| Optional. If any of these comma-separated tokens are present in a UPS driver's | ||
| `ups.status`, it qualifies for promotion to primary. Defaults to unset. | ||
|
|
||
| *status_have_all*='OL,CHRG,...':: | ||
| Optional. All listed comma-separated tokens must be present in `ups.status` for | ||
| the UPS driver to be eligible for promotion to primary. Defaults to unset. | ||
|
|
||
| *status_nothave_any*='OB,OFF,...':: | ||
| Optional. If any of these comma-separated tokens are present in `ups.status`, | ||
| the UPS driver is disqualified as a primary candidate. Defaults to unset. | ||
|
|
||
| *status_nothave_all*='OB,LB,...':: | ||
| Optional. If all of these comma-separated tokens are present in `ups.status`, | ||
| the UPS driver is disqualified as a primary candidate. Defaults to unset. | ||
sebastiankuttnig marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| IMPLEMENTATION | ||
| -------------- | ||
|
|
||
| The port argument in the linkman:ups.conf[5] should reference the local driver | ||
| sockets (or Windows named pipes) that the "real" UPS drivers are using. A most | ||
| basic defaults setup with multiple drivers could look like this: | ||
|
|
||
| ------ | ||
| [realups] | ||
| driver = usbhid-ups | ||
| port = auto | ||
|
|
||
| [realups2] | ||
| driver = usbhid-ups | ||
| port = auto | ||
|
|
||
| [failover] | ||
| driver = failover | ||
| port = usbhid-ups-realups,usbhid-ups-realups2 | ||
| ------ | ||
|
|
||
| Any linkman:upsmon[8] clients would be set to monitor the `failover` UPS. | ||
|
|
||
| The driver fully supports setting variables and performing instant commands on | ||
| the currently elected primary UPS driver, which are proxied and with end-to-end | ||
| tracking also being possible (linkman:upscmd[1] and linkman:upsrw[1] `-w`). You | ||
| may notice some variables and commands will be prefixed with `upstream.`, this | ||
| is to clearly separate the upstream commands from those of `failover` itself. | ||
|
|
||
| For your convenience, additional administrative commands are exposed to directly | ||
| influence and override the primary election process, e.g. for maintenance: | ||
|
|
||
| - `<socketname>.force.ignore [seconds]` will prevent that UPS driver from ever | ||
| becoming primary within the given timeframe, or permanently in case of a | ||
| negative value. A value of 0 resets the override state back to disabled. | ||
|
|
||
| - `<socketname>.force.primary [seconds]` will force that UPS driver to the | ||
| highest priority within the given timeframe, or permanently in case of a | ||
| negative value. A value of 0 resets the override state back to disabled. | ||
|
|
||
| If either command is executed without any argument, active overrides for that | ||
| UPS driver will be reset and returned to their default state of being disabled. | ||
sebastiankuttnig marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| PRIORITIES | ||
| ---------- | ||
|
|
||
| As outlined above, primaries are dynamically elected based on their current | ||
| state and according to a strict set of user influenceable priorities, which are: | ||
|
|
||
| - `0` (highest): UPS driver was forced to the top by administrative command. | ||
| - `1`: UPS driver has passed the user-defined status filters. | ||
| - `2`: UPS driver has fresh data and is online (in status `OL`). | ||
| - `3`: UPS driver has fresh data, but may not be fully online. | ||
| - `4` (lowest): UPS driver is alive, but may not have fresh data. | ||
|
|
||
| The UPS driver with the highest calculated priority is chosen as primary, ties | ||
| are resolved through order of the socket names given within the `port` argument. | ||
|
|
||
| For the user-defined status filters, the following internal order is respected: | ||
|
|
||
| 1. `status_nothave_any` (first) | ||
| 2. `status_have_all` | ||
| 3. `status_nothave_all` | ||
| 4. `status_have_any` (last) | ||
|
|
||
| If `strictfiltering` is enabled, priorities 2 to 4 are not applicable. | ||
|
|
||
| If no user-defined status filters are set, the priority 1 is not applicable. | ||
|
|
||
| NOTE: The base requirement for any election is the UPS socket being connectable | ||
| and the UPS driver having published at least one full batch of data during its | ||
| lifetime. UPS driver not fulfilling that requirement are always disqualified. | ||
|
|
||
| AUTHOR | ||
| ------ | ||
|
|
||
| Sebastian Kuttnig <[email protected]> | ||
|
|
||
| SEE ALSO | ||
| -------- | ||
|
|
||
| linkman:upscmd[1], | ||
| linkman:upsrw[1], | ||
| linkman:ups.conf[5], | ||
| linkman:upsc[8], | ||
| linkman:upsmon[8], | ||
| linkman:nutupsdrv[8] | ||
|
|
||
| Internet Resources: | ||
| ~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The NUT (Network UPS Tools) home page: https://www.networkupstools.org/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.