Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW Based FRR #2071

Merged
merged 4 commits into from
Oct 9, 2024
Merged

HW Based FRR #2071

merged 4 commits into from
Oct 9, 2024

Conversation

JaiOCP
Copy link
Contributor

@JaiOCP JaiOCP commented Aug 27, 2024

New NHG of type HW_PROTECTION is introduced. Reason is to help HW distinguish the protection NHG and treat it differently for switchover purposes.
Workflow is same as IPFRR except that SW managing the switchover, it is done by the HW i.e. there is no need for NOS to set the switchover attribute to true to trigger the takeover of secondary when primary fails.

Signed-off-by: JaiOCP <[email protected]>
Copy link
Contributor

@rck-innovium rck-innovium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, some of the pieces are missing, can you please add a API example.

@@ -56,6 +56,9 @@ typedef enum _sai_next_hop_group_type_t
/** Next hop group is class-based, with members selected by Forwarding class */
SAI_NEXT_HOP_GROUP_TYPE_CLASS_BASED,

/** Next hop hardware protection group. This is the group backing up the primary in the protection group type and is managed by hardware */
SAI_NEXT_HOP_GROUP_TYPE_HW_PROTECTION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How is the association between the active nexthop group (NHG) and the protection NHG modelled?

For example, 1) say there is a new object type to model the active+backup relation by having one attribute for active NHG and another for protection NHG; or 2) or maybe add a new attribute to the SAI route entry object to point to the protection NHG apart from the existing active NHG.

  • Can you elaborate on when the h/w should decide to do the switchover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How is the association between the active nexthop group (NHG) and the protection NHG modelled?

For example, 1) say there is a new object type to model the active+backup relation by having one attribute for active NHG and another for protection NHG; or 2) or maybe add a new attribute to the SAI route entry object to point to the protection NHG apart from the existing active NHG.

  • Can you elaborate on when the h/w should decide to do the switchover?

Hi Ravi, Its no different then SW FRR. Route points to NHG, NHG is of type PROTECTION and consists of primary and secondary NH or NHG. This is how SW FRR works.
For HW FRR , similar workflow. Route points to NHG, NHG if of type PROTECTIOn and consists or primary and secondary where seconday is of type HW PROTECTION. The fact that seconday is of type HW PROTECTION provides hints to hw to act on it in case of failures of primary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Jai, is this a distinction between:

  • NOS performs the switchover via SAI_NEXT_HOP_GROUP_ATTR_SET_SWITCHOVER'
  • SDK/HW performs the switchover based of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT

i.e. it's a hint that monitored object would be used?

Or a distinction between:

  • SDK performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
  • HW performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT

Or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jason,

Its a subtle but good point.
Today NOS performs the switchover via SAI_NEXT_HOP_GROUP_ATTR_SET_SWITCHOVER using the set attribute API.
This 'set' operation is not needed for HW FRR and HW FRR will work based on SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT.

Now if the monitoring of the object is done in SDK, FW or in HW is an implementation detail. Each approach will give you a switchover time probably ranging from ms to ns.

As far as this PR is concerned there is no difference between

  • SDK performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
  • HW performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT

Hope this helps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jai,

Should we have separate attributes for SDK vs HW based switchover? Some implementations might provide both options but with different capabilities and constraints e.g HW implementation might support only a port as a monitored object while SDK could monitor vlan member as well. Having the same attribute for both SDK and HW will make it harder for NOS to poll this distinction using capability query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ashutosh,

We typically don't expose SDK vs FW vs HW level of granularity in SAI.
This PR is focussing on HW and how it is done in HW is an implementation level detail.

@rlhui
Copy link
Collaborator

rlhui commented Sep 19, 2024

@ashutosh-agrawal , @marian-pritsak , @eddyk-nvidia , @j-bos , please help review this as well. Thanks.

SAI_PORT_STAT_IF_IN_HW_PROTECTION_SWITCHOVER_EVENTS,

/** SAI port stat if HW protection switchover related packet drops */
SAI_PORT_STAT_IF_IN_HW_PROTECTION_SWITCHOVER_DROP_PKTS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the packets drop experienced (packets sitting in down port queue) when the traffic is diverted to the now active backup link.

@rlhui
Copy link
Collaborator

rlhui commented Sep 26, 2024

@JaiOCP , can we please add a short md doc to explain this in more details?

@@ -56,6 +56,9 @@ typedef enum _sai_next_hop_group_type_t
/** Next hop group is class-based, with members selected by Forwarding class */
SAI_NEXT_HOP_GROUP_TYPE_CLASS_BASED,

/** Next hop hardware protection group. This is the group backing up the primary in the protection group type and is managed by hardware */
SAI_NEXT_HOP_GROUP_TYPE_HW_PROTECTION,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add details about the condition when a port is carrying traffic for primary path and regular nhg. In this case drops on port will account for both failover drops as well as link drops.

@rlhui rlhui added the reviewed PR is discussed in SAI Meeting label Oct 2, 2024
@tjchadaga
Copy link
Collaborator

@JaiOCP , can we please add a short md doc to explain this in more details?

@JaiOCP - could you please take care of this before the next discussion on this PR?

@JaiOCP
Copy link
Contributor Author

JaiOCP commented Oct 4, 2024 via email

@JaiOCP
Copy link
Contributor Author

JaiOCP commented Oct 4, 2024

@JaiOCP , can we please add a short md doc to explain this in more details?
Added

1394de0

@JaiOCP
Copy link
Contributor Author

JaiOCP commented Oct 8, 2024

@j-bos @rck-innovium @rlhui
This is 1.15 candidate. Please approve this PR. All the comments have been addressed

@tjchadaga
Copy link
Collaborator

@ashutosh-agrawal - could you please help sign-off on this as well?

@tjchadaga tjchadaga requested a review from kcudnik October 9, 2024 05:22
@tjchadaga
Copy link
Collaborator

PR discussed in community meeting a couple of times and signed off by community.

@tjchadaga tjchadaga merged commit 6b3dfda into opencomputeproject:master Oct 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewed PR is discussed in SAI Meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants