Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update component temperature thresholds #1062

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

dplore
Copy link
Member

@dplore dplore commented Mar 5, 2024

Change Scope

  • Deprecate existing temperature threshold because it uses units (uint32) which are not compatible with temperature (decimal)
  • Add high and low temperature thresholds using decimal64 celsius units
  • This change is backwards compatible

Tree view

*** /Users/dloher/old-tree.txt  Wed Mar 20 13:05:59 2024
--- /Users/dloher/newthres-tree.txt     Wed Mar 20 13:00:27 2024
***************
*** 8588,8603 ****
          |  +--ro switchover-ready?               boolean
          |  +--ro base-mac-address?               oc-yang:mac-address
          |  +--ro temperature
          |  |  +--ro instant?           decimal64
          |  |  +--ro avg?               decimal64
          |  |  +--ro min?               decimal64
          |  |  +--ro max?               decimal64
          |  |  +--ro interval?          oc-types:stat-interval
          |  |  +--ro min-time?          oc-types:timeticks64
          |  |  +--ro max-time?          oc-types:timeticks64
          |  |  +--ro alarm-status?      boolean
!         |  |  +--ro alarm-threshold?   uint32
!         |  |  +--ro alarm-severity?    identityref
          |  +--ro memory
          |  |  +--ro available?   uint64
          |  |  +--ro utilized?    uint64
--- 8588,8607 ----
          |  +--ro switchover-ready?               boolean
          |  +--ro base-mac-address?               oc-yang:mac-address
          |  +--ro temperature
          |  |  +--ro instant?                 decimal64
          |  |  +--ro avg?                     decimal64
          |  |  +--ro min?                     decimal64
          |  |  +--ro max?                     decimal64
          |  |  +--ro interval?                oc-types:stat-interval
          |  |  +--ro min-time?                oc-types:timeticks64
          |  |  +--ro max-time?                oc-types:timeticks64
          |  |  +--ro alarm-status?            boolean
+         |  |  +--ro alarm-threshold-lower?   decimal64
+         |  |  +--ro alarm-severity-lower?    identityref
+         |  |  +--ro alarm-threshold-upper?   decimal64
+         |  |  +--ro alarm-severity-upper?    identityref
+         |  |  x--ro alarm-threshold?         uint32
+         |  |  x--ro alarm-severity?          identityref
          |  +--ro memory
          |  |  +--ro available?   uint64
          |  |  +--ro utilized?    uint64

Platform Implementations

  • Reference Cisco IOS XR
=============================================================================================================
Location  TEMPERATURE                          Value     Crit    Major    Minor    Minor    Major    Crit
         Sensor                             (deg C)     (Lo)     (Lo)     (Lo)     (Hi)     (Hi)    (Hi)
-------------------------------------------------------------------------------------------------------------
0/RP0/CPU0 
         MB_JM01_L1_TEMP                        37      -10       -5        0      130      135      140
         MB_JM01_L2_TEMP                        36      -10       -5        0      130      135      140
         MB_JM11_L1_TEMP                        33      -10       -5        0      130      135      140
         MB_JM21_L1_TEMP                        36      -10       -5        0      130      135      140 
user@host> show chassis temperature-thresholds
                           Fan speed      Yellow alarm      Red alarm      Fire Shutdown
                          (degrees C)      (degrees C)     (degrees C)      (degrees C)
Item                     Normal  High   Normal  Bad fan   Normal  Bad fan     Normal
Routing Engine 0             48    54       85       85      100      100      102
Routing Engine 1             48    54       85       85      100      100      102
CB 0 Intake Temp Sensor      30    35       80       80       85       85       95
CB 0 Exhaust Temp Sensor     30    35       80       80       85       85       95
CB 0 CPU Die Temp Sensor     40    45       95       95      100      100      110
CB 1 Intake Temp Sensor      30    35       80       80       85       85       95
CB 1 Exhaust Temp Sensor     30    35       80       80       85       85       95
CB 1 CPU Die Temp Sensor     40    45       95       95      100      100      110
FPC 0 Intake-A Temp Sensor   30    35       80       80       85       85       95

@dplore dplore requested a review from a team as a code owner March 5, 2024 20:29
@OpenConfigBot
Copy link

OpenConfigBot commented Mar 5, 2024

No major YANG version changes in commit 4e2fd5b

@dplore
Copy link
Member Author

dplore commented Mar 12, 2024

Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds

@dplore
Copy link
Member Author

dplore commented Mar 20, 2024

Added severity for upper and lower.
Added tree diff view to the PR description.

@nkitchen
Copy link
Contributor

nkitchen commented Apr 1, 2024

Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds

How does the newest version of this PR allow 3 thresholds to be expressed? It looks to me like it can express 2 thresholds, with corresponding severities.

Another thing I find confusing: The description of alarm-severity-lower is "The severity of the current low temperature alarm" -- what is the "current low temperature alarm", in contrast with "the current high temperature alarm" from the description of alarm-severity-upper? We only have one leaf alarm-status, so I would expect that there is only a single current alarm.

To me, the model used by the transceivers makes more sense: There is a list of thresholds, keyed by severity, and each entry in the list can have a lower and/or upper temperature threshold.

@github-actions github-actions bot added the Stale label Sep 29, 2024
@nkitchen-arista
Copy link
Contributor

Looking at this again, I have another concern, in addition to the one that I expressed on April 1: I think the meaning of alarm-threshold-lower is not clear.

On a device that only raises alarms for rising temperatures, I could imagine that alarm-threshold-lower could be interpreted as the threshold for a lower-severity alarm, like "Yellow alarm" in the JunOS output above.

But it seems that some devices also have thresholds for temperatures going too low -- that's what the IOS XR output looks like to me: alarm-threshold-lower could mean one of "Minor (Lo)", "Major (Lo)", or "Crit (Lo)" [not sure which one].

I think the description for alarm-threshold-lower needs to identify one of these interpretations clearly (and likewise alarm-severity-lower).

(But I don't really think that's sufficient -- we still need further changes to address the requirement for more levels of severity.)

@github-actions github-actions bot removed the Stale label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for author
Development

Successfully merging this pull request may close these issues.

4 participants