Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcom SAI bug for Trident3-X3 #21247

Open
bradh352 opened this issue Dec 20, 2024 · 0 comments
Open

broadcom SAI bug for Trident3-X3 #21247

bradh352 opened this issue Dec 20, 2024 · 0 comments

Comments

@bradh352
Copy link
Contributor

bradh352 commented Dec 20, 2024

Description

On a Dell N3248TE switch running 202411 I'm getting a syncd termination due to timeout of removal of an ipv6 link-local neighbor address.

2024 Dec 21 04:42:05.260261 swmgmt NOTICE syncd#syncd: [none] SAI_API_NEXT_HOP:brcm_sai_remove_next_hop:441 Removing nhid 44 if_id 100095
2024 Dec 21 04:42:05.261548 swmgmt NOTICE swss#orchagent: :- removeNeighbor: Removed next hop fe80::1c52:a783:9a48:4d0d on Vlan2
2024 Dec 21 04:42:05.262580 swmgmt NOTICE syncd#syncd: [none] SAI_API_DASH_DIRECTION_LOOKUP:_brcm_sai_l2_ecmp_nbr_mac_delete:267 FDB : MAC:B2-80-80-F7-0C-1E vfi:0x2, is_fdb_del 0, dir 3
2024 Dec 21 04:42:35.655137 swmgmt ERR syncd#syncd: :- threadFunction: time span WD exceeded 30392 ms for remove:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"fe80::1c52:a783:9a48:4d0d","rif":"oid:0x6000000000768","switch_id":"oid:0x21000000000000"}
2024 Dec 21 04:42:35.655194 swmgmt ERR syncd#syncd: :- logEventData: op: remove, key: SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"fe80::1c52:a783:9a48:4d0d","rif":"oid:0x6000000000768","switch_id":"oid:0x21000000000000"}
2024 Dec 21 04:43:04.592860 swmgmt WARNING swss#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).
2024 Dec 21 04:43:05.322286 swmgmt ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
2024 Dec 21 04:43:05.322601 swmgmt ERR swss#orchagent: :- wait: failed to get response for getresponse
2024 Dec 21 04:43:05.322783 swmgmt ERR swss#orchagent: :- remove: remove status: SAI_STATUS_FAILURE
2024 Dec 21 04:43:05.322956 swmgmt ERR swss#orchagent: :- removeNeighbor: Failed to remove neighbor b2:80:80:f7:0c:1e on Vlan2, rv:-1
2024 Dec 21 04:43:05.323127 swmgmt ERR swss#orchagent: :- handleSaiRemoveStatus: Encountered failure in remove operation, exiting orchagent, SAI API: SAI_API_NEIGHBOR, status: SAI_STATUS_FAILURE
2024 Dec 21 04:43:05.323301 swmgmt NOTICE swss#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP

In the SAI replay log I can see the neighbor was added 5 minutes prior:

2024-12-21.04:38:20.373960|c|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"fe80::1c52:a783:9a48:4d0d","rif":"oid:0x6000000000768","switch_id":"oid:0x21000000000000"}|SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS=B2:80:80:F7:0C:1E
2024-12-21.04:38:20.376885|c|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x40000000007a7|SAI_NEXT_HOP_ATTR_TYPE=SAI_NEXT_HOP_TYPE_IP|SAI_NEXT_HOP_ATTR_IP=fe80::1c52:a783:9a48:4d0d|SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID=oid:0x6000000000768
...
2024-12-21.04:42:05.259290|r|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x40000000007a7
2024-12-21.04:42:05.261649|r|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"fe80::1c52:a783:9a48:4d0d","rif":"oid:0x6000000000768","switch_id":"oid:0x21000000000000"}
2024-12-21.04:43:05.321719|E|SAI_STATUS_FAILURE
2024-12-21.04:43:05.321862|a|SYNCD_INVOKE_DUMP
2024-12-21.04:44:05.381753|A|SAI_STATUS_FAILURE

Steps to reproduce the issue:

  1. Configure similar to:
{
      "PORT": {
        "Ethernet0": {
            "admin_status": "up",
            "alias": "oneGigE1/1",
            "autoneg": "on",
            "fec": "none",
            "index": "1",
            "lanes": "1",
            "mtu": "9100",
            "speed": "1000"
        }
    },
    "VLAN": {
        "Vlan2": {
            "mtu": "1500",
            "vlanid": "2"
        }
    },
    "VLAN_INTERFACE": {
	"Vlan2": {
           "ipv6_use_link_local_only": "enable",
         },
    	 "Vlan2|192.168.1.55/24": {}
    },
    "VLAN_MEMBER": {
        "Vlan2|Ethernet0": {
            "tagging_mode": "untagged"
        }
    }
}
  1. Make sure there are other ipv6 hosts on the network with link-local addresses
  2. Wait for neighbor timeout and removal and see crash

Describe the results you received:

All switch ports go down due to error.

Describe the results you expected:

switch ports stay up

Output of show version:

SONiC Software Version: SONiC.202411.2-bradh352
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: f7a56440d
Build date: Thu Dec 19 14:13:14 UTC 2024
Built by: brad@github-runner-ubuntu-2004

Platform: x86_64-dellemc_n3248te_c3338-r0
HwSKU: DellEMC-N3248TE
ASIC: broadcom
ASIC Count: 1
Serial Number: FSB8PK2
Model Number: 0HXY4C
Hardware Revision: 
Uptime: 07:04:50 up  1:34,  1 user,  load average: 1.22, 1.31, 1.14
Date: Sat 21 Dec 2024 07:04:50

Docker images:
REPOSITORY                    TAG                 IMAGE ID       SIZE
docker-orchagent              202411.2-bradh352   fdf05d43b3f5   354MB
docker-orchagent              latest              fdf05d43b3f5   354MB
docker-fpm-frr                202411.2-bradh352   d4979ae35a83   375MB
docker-fpm-frr                latest              d4979ae35a83   375MB
docker-nat                    202411.2-bradh352   062fe24aeb03   344MB
docker-nat                    latest              062fe24aeb03   344MB
docker-macsec                 latest              425734c782d0   344MB
docker-sflow                  202411.2-bradh352   5a8d07a4304a   342MB
docker-sflow                  latest              5a8d07a4304a   342MB
docker-teamd                  202411.2-bradh352   d16caeb5ea04   341MB
docker-teamd                  latest              d16caeb5ea04   341MB
docker-dhcp-relay             latest              979953fe7f4d   321MB
docker-platform-monitor       202411.2-bradh352   fe0edaeb6113   431MB
docker-platform-monitor       latest              fe0edaeb6113   431MB
docker-snmp                   202411.2-bradh352   1dc7700ca6c8   356MB
docker-snmp                   latest              1dc7700ca6c8   356MB
docker-sonic-mgmt-framework   202411.2-bradh352   2d5eadfe5a72   399MB
docker-sonic-mgmt-framework   latest              2d5eadfe5a72   399MB
docker-syncd-brcm             202411.2-bradh352   9881e0e377a1   753MB
docker-syncd-brcm             latest              9881e0e377a1   753MB
docker-sonic-bmp              202411.2-bradh352   1514a68a170a   313MB
docker-sonic-bmp              latest              1514a68a170a   313MB
docker-router-advertiser      202411.2-bradh352   6284b6c7fed5   312MB
docker-router-advertiser      latest              6284b6c7fed5   312MB
docker-mux                    202411.2-bradh352   475e47e7903a   363MB
docker-mux                    latest              475e47e7903a   363MB
docker-lldp                   202411.2-bradh352   96c59972cb35   357MB
docker-lldp                   latest              96c59972cb35   357MB
docker-sonic-gnmi             202411.2-bradh352   075f9d9da560   401MB
docker-sonic-gnmi             latest              075f9d9da560   401MB
docker-database               202411.2-bradh352   f6385c5a1998   320MB
docker-database               latest              f6385c5a1998   320MB
docker-eventd                 202411.2-bradh352   ae3cbb2b95a5   312MB
docker-eventd                 latest              ae3cbb2b95a5   312MB
docker-gbsyncd-broncos        202411.2-bradh352   4545d6f00888   351MB
docker-gbsyncd-broncos        latest              4545d6f00888   351MB
docker-gbsyncd-credo          202411.2-bradh352   02408068f2c2   324MB
docker-gbsyncd-credo          latest              02408068f2c2   324MB```

#### Output of `show techsupport`:



#### Additional information you deem important (e.g. issue happens only occasionally):


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant