-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HW Based FRR #2071
HW Based FRR #2071
Conversation
Signed-off-by: JaiOCP <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, some of the pieces are missing, can you please add a API example.
@@ -56,6 +56,9 @@ typedef enum _sai_next_hop_group_type_t | |||
/** Next hop group is class-based, with members selected by Forwarding class */ | |||
SAI_NEXT_HOP_GROUP_TYPE_CLASS_BASED, | |||
|
|||
/** Next hop hardware protection group. This is the group backing up the primary in the protection group type and is managed by hardware */ | |||
SAI_NEXT_HOP_GROUP_TYPE_HW_PROTECTION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How is the association between the active nexthop group (NHG) and the protection NHG modelled?
For example, 1) say there is a new object type to model the active+backup relation by having one attribute for active NHG and another for protection NHG; or 2) or maybe add a new attribute to the SAI route entry object to point to the protection NHG apart from the existing active NHG.
- Can you elaborate on when the h/w should decide to do the switchover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How is the association between the active nexthop group (NHG) and the protection NHG modelled?
For example, 1) say there is a new object type to model the active+backup relation by having one attribute for active NHG and another for protection NHG; or 2) or maybe add a new attribute to the SAI route entry object to point to the protection NHG apart from the existing active NHG.
- Can you elaborate on when the h/w should decide to do the switchover?
Hi Ravi, Its no different then SW FRR. Route points to NHG, NHG is of type PROTECTION and consists of primary and secondary NH or NHG. This is how SW FRR works.
For HW FRR , similar workflow. Route points to NHG, NHG if of type PROTECTIOn and consists or primary and secondary where seconday is of type HW PROTECTION. The fact that seconday is of type HW PROTECTION provides hints to hw to act on it in case of failures of primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Jai, is this a distinction between:
- NOS performs the switchover via SAI_NEXT_HOP_GROUP_ATTR_SET_SWITCHOVER'
- SDK/HW performs the switchover based of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
i.e. it's a hint that monitored object would be used?
Or a distinction between:
- SDK performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
- HW performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jason,
Its a subtle but good point.
Today NOS performs the switchover via SAI_NEXT_HOP_GROUP_ATTR_SET_SWITCHOVER using the set attribute API.
This 'set' operation is not needed for HW FRR and HW FRR will work based on SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT.
Now if the monitoring of the object is done in SDK, FW or in HW is an implementation detail. Each approach will give you a switchover time probably ranging from ms to ns.
As far as this PR is concerned there is no difference between
- SDK performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
- HW performs the switchover based on failure of SAI_NEXT_HOP_GROUP_MEMBER_ATTR_MONITORED_OBJECT
Hope this helps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jai,
Should we have separate attributes for SDK vs HW based switchover? Some implementations might provide both options but with different capabilities and constraints e.g HW implementation might support only a port as a monitored object while SDK could monitor vlan member as well. Having the same attribute for both SDK and HW will make it harder for NOS to poll this distinction using capability query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ashutosh,
We typically don't expose SDK vs FW vs HW level of granularity in SAI.
This PR is focussing on HW and how it is done in HW is an implementation level detail.
@ashutosh-agrawal , @marian-pritsak , @eddyk-nvidia , @j-bos , please help review this as well. Thanks. |
SAI_PORT_STAT_IF_IN_HW_PROTECTION_SWITCHOVER_EVENTS, | ||
|
||
/** SAI port stat if HW protection switchover related packet drops */ | ||
SAI_PORT_STAT_IF_IN_HW_PROTECTION_SWITCHOVER_DROP_PKTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the packets drop experienced (packets sitting in down port queue) when the traffic is diverted to the now active backup link.
@JaiOCP , can we please add a short md doc to explain this in more details? |
@@ -56,6 +56,9 @@ typedef enum _sai_next_hop_group_type_t | |||
/** Next hop group is class-based, with members selected by Forwarding class */ | |||
SAI_NEXT_HOP_GROUP_TYPE_CLASS_BASED, | |||
|
|||
/** Next hop hardware protection group. This is the group backing up the primary in the protection group type and is managed by hardware */ | |||
SAI_NEXT_HOP_GROUP_TYPE_HW_PROTECTION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add details about the condition when a port is carrying traffic for primary path and regular nhg. In this case drops on port will account for both failover drops as well as link drops.
Signed-off-by: JaiOCP <[email protected]>
Added the md doc explaining the workflow.
Commit ID:
1394de0
…On Fri, Oct 4, 2024 at 3:33 PM Tejaswini Chadaga ***@***.***> wrote:
@JaiOCP <https://github.com/JaiOCP> , can we please add a short md doc to
explain this in more details?
@JaiOCP <https://github.com/JaiOCP> - could you please take care of this
before the next discussion on this PR?
—
Reply to this email directly, view it on GitHub
<#2071 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKCSHLMGASVLF5YPQEJ6VL3ZZ4JVNAVCNFSM6AAAAABNGPYJU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJUG42TCMZYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
|
@j-bos @rck-innovium @rlhui |
@ashutosh-agrawal - could you please help sign-off on this as well? |
PR discussed in community meeting a couple of times and signed off by community. |
New NHG of type HW_PROTECTION is introduced. Reason is to help HW distinguish the protection NHG and treat it differently for switchover purposes.
Workflow is same as IPFRR except that SW managing the switchover, it is done by the HW i.e. there is no need for NOS to set the switchover attribute to true to trigger the takeover of secondary when primary fails.