Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
8dae3b5
drivers/, docs/: introduce failover driver
sebastiankuttnig May 19, 2025
5855e6e
server/netget.c: rewrite upstream prefix for proxying drivers
sebastiankuttnig May 19, 2025
fe08563
NEWS.adoc: introduce failover driver
sebastiankuttnig May 19, 2025
bf3b9e2
drivers/failover.c, NEWS.adoc: fixes for compiler warnings, add PR nu…
sebastiankuttnig May 19, 2025
961fbc2
drivers/failover.c: add shutdown non-handling
sebastiankuttnig May 19, 2025
6b33157
drivers/failover.c: free parse_port_argument() tmp on premature exit …
sebastiankuttnig May 19, 2025
e5f3631
drivers/failover.c: clean dstate after fsdmode 0, remove now freed va…
sebastiankuttnig May 19, 2025
2fac5e1
drivers/failover.c: preserve which port value failed the argument par…
sebastiankuttnig May 19, 2025
f4d8ab7
drivers/failover.c: make defensive freeing consistent throughout the …
sebastiankuttnig May 19, 2025
2c74214
drivers/failover.c: reword shutdown to be more clear
sebastiankuttnig May 22, 2025
1ad9b26
drivers/failover.c: use NUT_STRARG helper for null checks in various …
sebastiankuttnig May 23, 2025
683559c
drivers/failover.c: use enum for priorities
sebastiankuttnig May 23, 2025
fb3d251
drivers/failover.{c,h}: introduce failover.h for defines, typedefs
sebastiankuttnig May 23, 2025
18ac174
drivers/failover.c: do not fatalx on no connectable drivers, keep try…
sebastiankuttnig May 23, 2025
be52d0b
drivers/failover.c: remove redundant _init() calls for status/alarm
sebastiankuttnig May 23, 2025
f3b1541
drivers/failover.c: show truncation content at end of log message
sebastiankuttnig May 23, 2025
53359b7
drivers/failover.c: safeguard ups_promote_primary against NULL or dou…
sebastiankuttnig May 23, 2025
9f78e4c
docs/man/failover.txt: polish documentation and add rationale
sebastiankuttnig May 23, 2025
022342c
drivers/failover.c: remove progname from non fatal log message
sebastiankuttnig May 23, 2025
a6fda90
docs/man/failover.txt: make hyphens consistent
sebastiankuttnig May 23, 2025
22c8465
docs/man/failover.txt: add note to factor in network or lock-picking …
sebastiankuttnig May 23, 2025
44cabf3
docs/man/failover.txt: add limitations
sebastiankuttnig May 23, 2025
b799a9d
Merge branch 'master' into failover
jimklimov May 23, 2025
5321a55
docs/man/failover.txt: add 3rd party tool use case for rationale
sebastiankuttnig May 23, 2025
953d368
drivers/Makefile.am: add failover.h for dists
sebastiankuttnig May 24, 2025
12e7da4
docs/man/failover.txt: fix incompatible characters
sebastiankuttnig May 24, 2025
825edd2
drivers/failover.c: safer string to numeric conversions, improved arg…
sebastiankuttnig May 26, 2025
2c68099
drivers/failover.c: remove magic -1 from str_arg_to_int(), use INT_MI…
sebastiankuttnig May 26, 2025
95ea397
scripts/upsdrvsvcctl/nut-driver-enumerator.sh.in: add support for "dr…
jimklimov May 22, 2025
f001318
Merge branch 'failover' of github.com:sebastiankuttnig/nut into failover
sebastiankuttnig May 26, 2025
3625b15
drivers/failover.c: make csv_arg_to_array() more reusable
sebastiankuttnig May 26, 2025
36204e8
scripts/upsdrvsvcctl/nut-driver-enumerator.sh.in: report if other dev…
jimklimov May 26, 2025
962bac4
drivers/failover.c: use str_to_int() also in instcmd()
sebastiankuttnig May 26, 2025
505853f
Merge branch 'master' into failover
sebastiankuttnig May 28, 2025
784ce12
drivers/failover.{c,h}, docs/man/failover.txt: use _sockfn() for one-…
sebastiankuttnig May 28, 2025
ff08659
tests/nut-driver-enumerator-test.sh: reflect recent enumerator change…
sebastiankuttnig May 28, 2025
54c604b
NEWS.adoc: mention NDE change to track inter-driver dependency [#2962]
jimklimov May 28, 2025
89a0db0
docs/man/nut-driver-enumerator.txt: update intro, mention driver-on-d…
jimklimov May 28, 2025
ccaac91
drivers/failover.{c,h}: introduce checkruntime argument
sebastiankuttnig May 28, 2025
c498dd3
docs/man/failover.txt, docs/nut.dict: introduce checkruntime argument
sebastiankuttnig May 28, 2025
d8f171c
drivers/failover.c: minor improvements to order and debug levels
sebastiankuttnig May 28, 2025
9a6f56c
drivers/failover.{c,h}: store runtimes in UPS struct
sebastiankuttnig May 28, 2025
0937f25
drivers/failover.c: improve guarding of ups->status against NULL dere…
sebastiankuttnig May 28, 2025
fb3cac7
drivers/failover.{c,h}: make UPS priorities more readable in code
sebastiankuttnig May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ https://github.com/networkupstools/nut/milestone/9
This seems to be a protocol developed by Cyber Energy for serial-port
devices, subsequently used by different vendors in their own products
or re-branded Cyber Energy creations. [#2940]
* Introduced a `failover` driver for monitoring multiple UPS driver sockets
and seamless switching out of UPS data in a failover situation, includes
support for end-to-end tracked instant commands and also variable updating.
[#2962]

- NUT Monitor GUI:
* Ported Python 3 version to Qt6, now shipped alongside Qt5 for systems
Expand Down
3 changes: 3 additions & 0 deletions docs/man/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -856,6 +856,7 @@ SRC_SERIAL_PAGES = \
clone.txt \
clone-outlet.txt \
dummy-ups.txt \
failover.txt \
etapro.txt \
everups.txt \
gamatronic.txt \
Expand Down Expand Up @@ -908,6 +909,7 @@ INST_MAN_SERIAL_PAGES = \
dummy-ups.$(MAN_SECTION_CMD_SYS) \
etapro.$(MAN_SECTION_CMD_SYS) \
everups.$(MAN_SECTION_CMD_SYS) \
failover.$(MAN_SECTION_CMD_SYS) \
gamatronic.$(MAN_SECTION_CMD_SYS) \
genericups.$(MAN_SECTION_CMD_SYS) \
isbmex.$(MAN_SECTION_CMD_SYS) \
Expand Down Expand Up @@ -975,6 +977,7 @@ INST_HTML_SERIAL_MANS = \
dummy-ups.html \
etapro.html \
everups.html \
failover.html \
gamatronic.html \
genericups.html \
isbmex.html \
Expand Down
229 changes: 229 additions & 0 deletions docs/man/failover.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
FAILOVER(8)
==========

NAME
----

failover - UPS Failover Driver

SYNOPSIS
--------

*failover* -h

*failover* -a 'UPS_NAME' ['OPTIONS']

NOTE: This man page only documents the specific features of the failover driver.
For information about the core driver, see linkman:nutupsdrv[8].

DESCRIPTION
-----------

The `failover` driver acts as a smart proxy for multiple "real" UPS drivers. It
connects to and monitors these underlying UPS drivers through their local UNIX
sockets (or Windows named pipes), continuously evaluating health and suitability
for "primary" duty according to a set of user configurable rules and priorities.

At any given time, `failover` designates one UPS driver as the *primary*, and
presents its commands, variables and status to the outside world as if it were
directly talking to that UPS. From the perspective of the clients (such as
linkman:upsmon[8] or linkman:upsc[8]), the `failover` driver behaves like any
single UPS, abstracting away the underlying redundancy, and allowing for
seamless transitioning between all monitored UPS drivers and their datasets.

The driver dynamically promotes or demotes the primary UPS driver based on:

- Socket availability and communication status
- Data freshness and UPS online/offline indicators
- User-defined status filters (e.g., presence or absence of `OL`, `LB`, ...)
- Administrative override via control commands (`force.primary`, `force.ignore`)

If the current primary becomes unavailable or no longer meets the criteria, the
driver automatically fails over to a more suitable driver. During transitions,
it ensures that any data is switched out instantly, without the linkman:upsd[8]
considering it as stale or the clients acting on any previously degraded status.

When no suitable primary is available, a configurable fallback state is entered:

- Keep last primary and declare the data as stale
- Raise `ALARM` and declare the data as stale
- Raise `ALARM` and set forced shutdown (`FSD`)

How the UPS are connected (be it corded, networked, ...) to the machine does not
matter, `failover` is also not reliant on linkman:upsd[8] itself running. In
principle, it could even be used on multiple drivers connected to the same UPS,
but do note that any missing data would not be multiplexed between the drivers.

In summary, `failover` simplifies multi UPS driver setups by consolidating
monitoring and control into a single NUT-visible "device", reducing complexity
and ensuring seamless transitions in high-availability environments.

EXTRA ARGUMENTS
---------------

This driver supports the following settings:

*port*='drivername-devicename,drivername2-devicename2,...'::
Required. Specifies the local socket names (or Windows named pipes) of the
underlying UPS drivers to be tracked. Entries must follow the format
`drivername-devicename`, as used by NUT's internal socket naming convention
(e.g. `usbhid-ups-ups1`). Multiple entries are comma-separated with no spaces.

*inittime*='seconds'::
Optional. Sets a grace period after driver startup during which the absence of a
primary UPS is tolerated. This allows time for underlying drivers to initialize.
Defaults to 30 seconds.

*deadtime*='seconds'::
Optional. Sets a grace period in seconds after which a non-responsive UPS driver
is considered dead. Defaults to 30 seconds.

*relogtime*='seconds'::
Optional. Time interval in which repeated connection failure logs are emitted
for a UPS, reducing log spam during unstable conditions. Defaults to 5 seconds.

*noprimarytime*='seconds'::
Optional. Duration to wait without a suitable primary UPS driver before entering
the configured fallback mode (`fsdmode`). Defaults to 15 seconds.

*maxconnfails*='count'::
Optional. Number of consecutive connection failures allowed per UPS driver
before entering into the cooldown period (`coolofftime`). Defaults to 5.

*coolofftime*='seconds'::
Optional. Cooldown period during which the driver pauses reconnect attempts
after exceeding `maxconnfails`. Defaults to 15 seconds.

*fsdmode*='0|1|2'::
Optional. Defines the behavior when no suitable primary UPS driver is found
after `noprimarytime` has elapsed. Defaults to 0.

- `0`: *Do not demote the last primary, but mark its data stale.* This is
similar to how a regular UPS driver would behave when it loses its connection to
the target UPS device. linkman:upsmon[8] will act on the last known (online or
not) status, and decide itself whether that UPS should be considered critical.

- `1`: *Demote the primary, raise `ALARM` and mark the data stale after an
additional few seconds have elapsed (ensuring full propagation).* This will
force monitoring linkman:upsmon[8] to see a previously in an alarm state device
having lost its connection and consider the UPS driver critical, possibly
resulting in forced shutdown (`FSD`) by depletion of `MINSUPPLIES`.

- `2`: *Demote the primary, raise `ALARM` and set immediate `FSD`.* This will
set `FSD` from the driver side and omit linkman:upsmon[8] to raise it itself.
This mode is for setups where immediate shutdown is warranted, regardless of
anything else, and getting `FSD` out to the clients as fast as just possible.

*strictfiltering*='0|1':: Optional. If set to 1, only UPS matching the
configured status filters are considered for promotion to primary. If set to 0,
the hard-coded default logic is also considered when no status filters match
(read more about this further down). Defaults to 0.

*status_have_any*='OL,CHRG,...'::
Optional. If any of these comma-separated tokens are present in a UPS driver's
`ups.status`, it qualifies for promotion to primary. Defaults to unset.

*status_have_all*='OL,CHRG,...'::
Optional. All listed comma-separated tokens must be present in `ups.status` for
the UPS driver to be eligible for promotion to primary. Defaults to unset.

*status_nothave_any*='OB,OFF,...'::
Optional. If any of these comma-separated tokens are present in `ups.status`,
the UPS driver is disqualified as a primary candidate. Defaults to unset.

*status_nothave_all*='OB,LB,...'::
Optional. If all of these comma-separated tokens are present in `ups.status`,
the UPS driver is disqualified as a primary candidate. Defaults to unset.

IMPLEMENTATION
--------------

The port argument in the linkman:ups.conf[5] should reference the local driver
sockets (or Windows named pipes) that the "real" UPS drivers are using. A most
basic defaults setup with multiple drivers could look like this:

------
[realups]
driver = usbhid-ups
port = auto

[realups2]
driver = usbhid-ups
port = auto

[failover]
driver = failover
port = usbhid-ups-realups,usbhid-ups-realups2
------

Any linkman:upsmon[8] clients would be set to monitor the `failover` UPS.

The driver fully supports setting variables and performing instant commands on
the currently elected primary UPS driver, which are proxied and with end-to-end
tracking also being possible (linkman:upscmd[1] and linkman:upsrw[1] `-w`). You
may notice some variables and commands will be prefixed with `upstream.`, this
is to clearly separate the upstream commands from those of `failover` itself.

For your convenience, additional administrative commands are exposed to directly
influence and override the primary election process, e.g. for maintenance:

- `<socketname>.force.ignore [seconds]` will prevent that UPS driver from ever
becoming primary within the given timeframe, or permanently in case of a
negative value. A value of 0 resets the override state back to disabled.

- `<socketname>.force.primary [seconds]` will force that UPS driver to the
highest priority within the given timeframe, or permanently in case of a
negative value. A value of 0 resets the override state back to disabled.

If either command is executed without any argument, active overrides for that
UPS driver will be reset and returned to their default state of being disabled.

PRIORITIES
----------

As outlined above, primaries are dynamically elected based on their current
state and according to a strict set of user influenceable priorities, which are:

- `0` (highest): UPS driver was forced to the top by administrative command.
- `1`: UPS driver has passed the user-defined status filters.
- `2`: UPS driver has fresh data and is online (in status `OL`).
- `3`: UPS driver has fresh data, but may not be fully online.
- `4` (lowest): UPS driver is alive, but may not have fresh data.

The UPS driver with the highest calculated priority is chosen as primary, ties
are resolved through order of the socket names given within the `port` argument.

For the user-defined status filters, the following internal order is respected:

1. `status_nothave_any` (first)
2. `status_have_all`
3. `status_nothave_all`
4. `status_have_any` (last)

If `strictfiltering` is enabled, priorities 2 to 4 are not applicable.

If no user-defined status filters are set, the priority 1 is not applicable.

NOTE: The base requirement for any election is the UPS socket being connectable
and the UPS driver having published at least one full batch of data during its
lifetime. UPS driver not fulfilling that requirement are always disqualified.

AUTHOR
------

Sebastian Kuttnig <[email protected]>

SEE ALSO
--------

linkman:upscmd[1],
linkman:upsrw[1],
linkman:ups.conf[5],
linkman:upsc[8],
linkman:upsmon[8],
linkman:nutupsdrv[8]

Internet Resources:
~~~~~~~~~~~~~~~~~~~

The NUT (Network UPS Tools) home page: https://www.networkupstools.org/
16 changes: 15 additions & 1 deletion docs/nut.dict
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
personal_ws-1.1 en 3495 utf-8
personal_ws-1.1 en 3509 utf-8
AAC
AAS
ABI
Expand Down Expand Up @@ -371,6 +371,7 @@ Exar
ExecCGI
ExecStart
ExecStartPre
FAILOVER
FBCA
FD
FDE
Expand Down Expand Up @@ -1807,6 +1808,8 @@ configureaza
confpath
const
contrib
cooldown
coolofftime
copyrightable
coreutils
cout
Expand Down Expand Up @@ -1876,6 +1879,7 @@ ddk
ddl
de
deUNV
deadtime
debian
debootstrap
debouncing
Expand Down Expand Up @@ -2035,6 +2039,7 @@ faa
fabula
facto
failmode
failover
fallthrough
fasttrack
fatalx
Expand Down Expand Up @@ -2076,6 +2081,7 @@ frob
frontends
fs
fsd
fsdmode
fsr
fstab
ftdi
Expand Down Expand Up @@ -2227,6 +2233,7 @@ includePath
includedir
inductor
inet
influenceable
infos
infoval
inh
Expand All @@ -2237,6 +2244,7 @@ initializer
initializers
initinfo
initscripts
inittime
initups
inline
inlined
Expand Down Expand Up @@ -2478,6 +2486,7 @@ manpage
manpages
masterguard
matcher
maxconnfails
maxd
maxlength
maxreport
Expand Down Expand Up @@ -2632,12 +2641,14 @@ nombattvolt
noncommercially
noout
nooutstats
noprimarytime
norating
noro
noscanlangid
nosnap
nosuid
notAfter
nothave
notifyflags
notifyme
notifymsg
Expand Down Expand Up @@ -2826,6 +2837,7 @@ proc
productid
prog
progname
proxied
prtconf
ps
psu
Expand Down Expand Up @@ -2887,6 +2899,7 @@ regtype
relatime
releasekeyring
relicensing
relogtime
remoteip
renderer
renderers
Expand Down Expand Up @@ -3102,6 +3115,7 @@ strcpy
strdup
strerror
strftime
strictfiltering
stringify
strlen
strncpy
Expand Down
5 changes: 4 additions & 1 deletion drivers/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ endif HAVE_LIBREGEX
# in top level.
NUTSW_DRIVERLIST_DUMMY_UPS = dummy-ups$(EXEEXT)
NUTSW_DRIVERLIST = $(NUTSW_DRIVERLIST_DUMMY_UPS) \
clone clone-outlet apcupsd-ups skel
clone clone-outlet failover apcupsd-ups skel
SERIAL_DRIVERLIST = al175 bcmxcp belkin belkinunv bestfcom \
bestfortress bestuferrups bestups etapro everups \
gamatronic genericups isbmex liebert liebert-esp2 liebert-gxe masterguard metasys \
Expand Down Expand Up @@ -226,6 +226,9 @@ endif WITH_SSL
clone_SOURCES = clone.c
clone_outlet_SOURCES = clone-outlet.c

# failover driver (in NUTSW_DRIVERLIST)
failover_SOURCES = failover.c

# apcupsd client driver (in NUTSW_DRIVERLIST)
apcupsd_ups_SOURCES = apcupsd-ups.c
apcupsd_ups_CFLAGS = $(AM_CFLAGS)
Expand Down
Loading