Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netvision-mib: sync netvision_output_info with SOCOMECUPS-MIB.txt #2803

Merged
merged 4 commits into from
Feb 14, 2025

Conversation

mbouyer
Copy link
Contributor

@mbouyer mbouyer commented Feb 12, 2025

Hello
We have servers powered from two MASTERYS 3/3 SYSTEM 60 kVA UPSes, with some of them with redondant power supplies from both UPSes.

After a power failure, the batteries went low and the UPSes cut outputs. Everything worked as expected for hosts with a single
power source, but not for those with redondant power. After the first UPS went down, upsmon started reporting this UPS
as being at the same time OL and OB (and not LB any more):
Feb 11 08:52:27 localhost upsmon[2175]: UPS ups-1@localhost battery is low
Feb 11 08:56:23 localhost upsmon[2175]: Giving up on the primary for UPS [ups-1@localhost] after 241 sec since last comms
Feb 11 08:56:28 localhost upsmon[2175]: Giving up on the primary for UPS [ups-1@localhost] after 246 sec since last comms
Feb 11 08:56:33 localhost upsmon[2175]: Giving up on the primary for UPS [ups-1@localhost] after 251 sec since last comms
Feb 11 08:56:38 localhost upsmon[2175]: Giving up on the primary for UPS [ups-1@localhost] after 256 sec since last comms
Feb 11 08:56:43 localhost upsmon[2175]: Giving up on the primary for UPS [ups-1@localhost] after 261 sec since last comms
Feb 11 08:56:48 localhost upsmon[2175]: UPS ups-1@localhost on line power
Feb 11 08:56:48 localhost upsmon[2175]: UPS ups-1@localhost on battery
Feb 11 08:56:53 localhost upsmon[2175]: UPS ups-1@localhost on line power
Feb 11 08:56:53 localhost upsmon[2175]: UPS ups-1@localhost on battery
Feb 11 08:56:58 localhost upsmon[2175]: UPS ups-1@localhost on line power
Feb 11 08:56:58 localhost upsmon[2175]: UPS ups-1@localhost on battery
(and so on until the second UPS ran out of battery)
At this point nagios reported the UPS as:
[1739260700] SERVICE ALERT: localhost;nut-ups-1;WARNING;HARD;4;UPS WARNING - Status=OnlineOn Battery, Boosting, Discharging Batt=47.0% Left=592.0min
while the UPS output was, in fact, off.
The result is that when ups-2 ran out of batteries, upsmon considered that ups-1 was still alive and didn't shutdown servers
with dual power source.

The MIB I have for this UPS (which match the only one publically available I found) shows for upsOutputSource:
SYNTAX INTEGER {
unknown(1),
onInverter(2),
onMains(3),
ecoMode(4),
onBypass(5),
standby(6),
onMaintenanceBypass(7),
upsOff(8),
normalMode(9)
}

This dissagrees with netvision_output_info[] in drivers/netvision-mib.c

Our zabbix server recorded the raw SNMP values during the incident. The values were:
NETVISION_OID_BATTERYSTATUS:
2 (normal) -> 5 (discharing) -> 3 (low) / the UPS turned output off/ -> 5
upsAlarmOnBattery:
0 / main power shutdown / -> 1 (until blackout in the server room at last, probably until main power came back)
NETVISION_OID_OUTPUT_SOURCE:
2 -> / the UPS turned output off/ ->6

This matches the MIB fragment above, and also explains why NUT reported OL BOOST while the UPS was off.

This patch gets netvision_output_info[] in sync with information available, and also probably fixes the issue we noticed. I'm now running with this patch installed, but this is hard to test for me, as I won't cause another server room blackout only to test this.

Do you know where the actual values for netvision_output_info[] comes from ? I wonder we there could be UPSes with the same MIB but sighly different values for upsOutputSource.

The netvision_output_info array doens't match the upsOutputSource values from
SOCOMECUPS-MIB.txt (the only Socomec MIB publically available) and
also doesn't match values observed on our MASTERYS 3/3 SYSTEM which has
a Net Vision v6.32 card.

Signed-off-by: Manuel Bouyer <[email protected]>
@AppVeyorBot
Copy link

@jimklimov
Copy link
Member

jimklimov commented Feb 14, 2025

Hard to say about where the values come from; git history goes through some renames in 2010 (snmp subdrivers were headers before that, so netvisionmib.h), down to import from SVN in 2005.

Key commits in that early history seem to be a0dd365 78bdf00 a3b36a2 but they don't seem to clarify the origins of data (at least not as much as I can see while traveling now).

One of them introduces the new mappings for
Socomec Sicon UPS with Netvision Web/SNMP management card/external box
but that's about all I can quickly see.

@jimklimov jimklimov added SNMP Incorrect or missing readings On some devices driver-reported values are systemically off (e.g. x10, x0.1, const+Value, etc.) impacts-release-2.8.2 Issues reported against NUT release 2.8.2 (maybe vanilla or with minor packaging tweaks) labels Feb 14, 2025
@jimklimov jimklimov added this to the 2.8.3 milestone Feb 14, 2025
@jimklimov jimklimov merged commit e718a1e into networkupstools:master Feb 14, 2025
24 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impacts-release-2.8.2 Issues reported against NUT release 2.8.2 (maybe vanilla or with minor packaging tweaks) Incorrect or missing readings On some devices driver-reported values are systemically off (e.g. x10, x0.1, const+Value, etc.) SNMP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants