Skip to content

Conversation

donaldsharp
Copy link
Member

See individual commits, but this fixes the bgp_evpn_rt5_addpath frequent upstream test failures.

@donaldsharp
Copy link
Member Author

ci:rerun

bgpd/bgp_evpn.c Outdated
@@ -1616,7 +1616,8 @@ static struct bgp_path_info *bgp_evpn_route_get_local_path(struct bgp *bgp, stru
}

static int update_evpn_type5_route_entry(struct bgp *bgp_evpn, struct bgp *bgp_vrf, afi_t afi,
safi_t safi, struct bgp_dest *dest, struct attr *attr,
safi_t safi, struct bgp_dest *dest,
struct bgp_path_info *originator, struct attr *attr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originator is a bit confusing as there is also a BGP attribute with the same name. I would suggest changing this to origin_pi or vrf_pi or something along those lines to denote this is a pointer to the vrf route path which is getting exported as a type-5 route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can change the name to make it clearererererfe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bgpd/bgp_evpn.c Outdated
@@ -1684,6 +1686,11 @@ static int update_evpn_type5_route_entry(struct bgp *bgp_evpn, struct bgp *bgp_v
} else {
tmp_pi = local_pi;
if (!attrhash_cmp(tmp_pi->attr, attr)) {
if (originator != local_pi->extra->evpn->type5_originator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now tracking this, shouldn't this create a new local_pi and add it to the list of paths for this route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the local pi that matches the vrf pathinfo( apparently )

bgpd/bgp_evpn.c Outdated
@@ -1666,7 +1667,8 @@ static int update_evpn_type5_route_entry(struct bgp *bgp_evpn, struct bgp *bgp_v
SET_FLAG(pi->flags, BGP_PATH_MULTIPATH);

/* Type-5 routes advertise the L3-VNI */
bgp_path_info_extra_get(pi);
bgp_evpn_path_info_extra_get(pi);
pi->extra->evpn->type5_originator = originator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the introduction of this, we will have more than one local path. How is the best path calculated in EVPN table for such a scenario ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it follows normal best path rules. Which may not actually be anything relevant to ensuring that we get the right behavior here at all. Especially since first path received in the vrf may not be the actual resulting bestpath. I think this should be another item to put down for reworking in the future.

void bgp_evpn_advertise_type5_route(struct bgp *bgp_vrf, const struct prefix *p,
struct attr *src_attr, afi_t afi, safi_t safi,
uint32_t addpath_id)
void bgp_evpn_advertise_type5_route(struct bgp *bgp_vrf, struct bgp_path_info *originator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have to change how this function is called, this seems to be called for all the paths but it should only be called for best paths and multi-paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes long term we should convert to a better approach here. I do not believe we should be calling this function as part of the update process. It should be after bestpath has been figured. Another large change that needs to be addressed

@@ -3451,9 +3451,11 @@ static int bgp_zebra_process_local_ip_prefix(ZAPI_CALLBACK_ARGS)
if (cmd == ZEBRA_IP_PREFIX_ROUTE_ADD) {

if (p.family == AF_INET)
bgp_evpn_advertise_type5_route(bgp_vrf, &p, NULL, AFI_IP, SAFI_UNICAST, 0);
bgp_evpn_advertise_type5_route(bgp_vrf, NULL, &p, NULL, AFI_IP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the flow for this code path - I think we might have to change how this is working too. This is directly advertising from the zebra notification - we should be instead doing this from BGP local rib. Not sure why it was not done that way to begin with (I think I wrote the original code but I don't remember why I decided to implement it this way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as last comment, we should be calling the insertion one time after bestpath has been run. This would solve both the bgp_update path and this path

@@ -1631,8 +1631,13 @@ static int update_evpn_type5_route_entry(struct bgp *bgp_evpn, struct bgp *bgp_v

*route_changed = 0;

/* See if this is an update of an existing route, or a new add. */
local_pi = bgp_evpn_route_get_local_path(bgp_evpn, dest, addpath_id);
for (local_pi = bgp_dest_get_bgp_path_info(dest); local_pi; local_pi = local_pi->next) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the bgp_evpn_get_local_path instead ? We can pass the originator path info instead of add path_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bgp_evpn_route_get_local_path was being called in other places that I did not fully understand hence my hesitation to change it.

bgpd/bgp_evpn.c Outdated
@@ -2462,7 +2478,7 @@ static int delete_evpn_type5_route(struct bgp *bgp_vrf, struct prefix_evpn *evp,

frrtrace(2, frr_bgp, evpn_withdraw_type5, bgp_vrf->vrf_id, evp);

pi = delete_evpn_route_entry(bgp_evpn, afi, safi, dest, addpath_id);
pi = delete_evpn_route_entry(bgp_evpn, afi, safi, dest, originator, addpath_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code it looks like a valid add path_id is only passed for EVPN type5 routes, if that's indeed the case and it is not required for other route type, we can remove add path-id as the key for evpn type-5 route and only keep originator instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make that change, yes

@@ -3580,7 +3580,14 @@ static void bgp_process_evpn_route_injection(struct bgp *bgp, afi_t afi,

if (ret == RMAP_DENYMATCH) {
bgp_attr_flush(&dummy_attr);
bgp_evpn_withdraw_type5_route(bgp, p, afi, safi, 0);
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow this scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) read a packet in but store it for processing via bgp_update
b) cli change to route-map processing that would cause old_select to be dropped
c) wake up and process the packet

If we do not remove the old_select then we will end up with a orphaned path_info in the evpn table.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that we have for imported routes (ultimate)?

@riw777 riw777 self-requested a review September 2, 2025 15:46
@pbrisset
Copy link

pbrisset commented Sep 8, 2025

Hi Don,

What is the issues?
Can you describe the proposed fix?

Patrice

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

With `debug bgp zebra` turned on I can get a consistent
crash with the vpn->bgp_vrf being null.  Let's just protect
against it.

Signed-off-by: Donald Sharp <[email protected]>
Currently the type5 evpn routes are this weird almagation of a single
path_info and if addpath is turned on( but only in certain orders )
you get multipath.  If you start perturbing the addpath code here
you end up with a situation where the path_info's associated with
the type 5 are in a bad state.

Let's start the transition to properly tracking the type-5 routes
to where they originated from ( ala what is done in other import
situations ).

So let's add the originator path_info for the type-5 route( so
we can link the two, when one changes ) into the evpn specific
data structure.  Also start to pass around the originator path_info.

Signed-off-by: Donald Sharp <[email protected]>
Currently, type-5 routes are assumed to allow multiple evpn path_info's
per matching route:

IPv4 unicast VRF vrf100:
B>* 10.0.0.0/24 [20/0] (22) via 10.0.0.0, r1-eth0, weight 1, 00:01:01
  *                         via 10.0.0.2, r1-eth1, weight 1, 00:01:01

with `advertise-all-vni` turned on we are getting two paths, but
only if the addpath capability has been actively negotiated with
a peer in the evpn afi/safi.  This was further excaberated by
the fact that evpn was using the addpath_id as a differentiator
to match local paths.  So if you happen to import before addpath
was turned on then you ended up with 1 path in the local evpn table.

This all stems from the fact that evpn was originally written to
assume only 1 local path.  The addpath evpn PR added the ability
to have multiple local paths, but this breaks if addpath is actually
turned on before the routes are imported into evpn.  This can
be seen in the upstream CI with the frequent breakage of the
bgp_evpn_rt5_addpath test.

Let's modify type-5 routes to track the originator path_info and use
that as a discriminator to whether or not we have data on it or not.

Signed-off-by: Donald Sharp <[email protected]>
@donaldsharp donaldsharp force-pushed the bgp_evpn_pass_originator branch from 8f5975c to be5e383 Compare September 10, 2025 17:45
@Jafaral
Copy link
Member

Jafaral commented Sep 16, 2025

Hi Don,

What is the issues? Can you describe the proposed fix?

Patrice

See the commit msgs @pbrisset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants