Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sonic-vs : Syncd not getting netlink messages and oper status not updated #1357

Open
sudhiaithal opened this issue Feb 15, 2024 · 5 comments

Comments

@sudhiaithal
Copy link

I am seeing an issue where syncd is not getting netlink message when link is added/deleted/up/down.

When syncd starts, it is getting all the messags as expected
Feb 15 18:11:04.828518 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: lo, ifflags: 0x10049, ifindex: 1
Feb 15 18:11:04.828550 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received done RTM_NEWLINK ifname: lo, ifflags: 0x10049, ifindex: 1
Feb 15 18:11:04.828651 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth0, ifflags: 0x11043, ifindex: 1745
Feb 15 18:11:04.828664 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received done RTM_NEWLINK ifname: eth0, ifflags: 0x11043, ifindex: 1745

However, after a while when I do ifconfig eth0 up/down, syncd does not get any message but other process such as portsyncd gets

Feb 15 20:03:01.516713 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:0 oper:0 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth
Feb 15 20:03:03.236337 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:0 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth
Feb 15 20:03:03.236486 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:1 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth
Feb 15 20:03:03.247468 874a0e235413 NOTICE #fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: 172.17.0.0/16 0.0.0.0 eth0
Feb 15 20:03:05.082645 874a0e235413 NOTICE #fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0
....

This is preventing from updating correct oper status, VS image old branch 202106, It works correctly as shown by below

Feb 13 18:43:44.509381 de88276cddc7 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth5, ifflags: 0x11103, ifindex: 93
Feb 13 18:43:44.509409 de88276cddc7 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth5, ifflags: 0x11143, ifindex: 93
Feb 13 18:43:44.509458 de88276cddc7 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth5 admin:1 oper:0 addr:7a:01:26:fd:50:d5 ifindex:93 master:0 type:veth
Feb 13 18:43:44.509485 de88276cddc7 NOTICE #syncd: :- syncOnLinkMsg: newlink: ifindex: 93, ifflags: 0x11103, ifname: eth5
Feb 13 18:43:44.509535 de88276cddc7 NOTICE #syncd: :- send_port_oper_status_notification: send event SAI_SWITCH_ATTR_PORT_STATE_CHANGE_NOTIFY for port oid:0x100000005: SAI_PORT_OPER_STATUS_UP
Feb 13 18:43:44.509627 de88276cddc7 NOTICE #syncd: :- syncOnLinkMsg: newlink: ifindex: 93, ifflags: 0x11143, ifname: eth5
Feb 13 18:43:44.509719 de88276cddc7 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth5 admin:1 oper:1 addr:7a:01:26:fd:50:d5 ifindex:93 master:0 type:veth

@sudhiaithal
Copy link
Author

root@f4b1252e2cc5:/# grep "asyncOnLinkMsg" /var/log/syslog | wc -l
1050
root@f4b1252e2cc5:/#

However on old, just.1 for each interface

root@de88276cddc7:/# grep "asyncOnLinkMsg" /var/log/syslog | grep Ethernet | wc -l
103
root@de88276cddc7:/#

@sudhiaithal
Copy link
Author

I was able to get around this problem by creating veth interface eth0-31 , that way all Ethernet* interface can map to a tap interface. After that this problem seems to go away

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 16, 2024

not sure if this is exact syncd issue, depends who is responsible to generate this netlink messages, syncd is listening to all those messages, but port up/down is not up to syncd, is this on real hardware or virtual switch ?

@sudhiaithal
Copy link
Author

this is on virtual switch. I think flood of messages is causing some lock up on netlink socket of sycnd.
So, if we just bring up VS without all veth interfaces up then I see this issue. Seems to work fine when all veth interfaces are created before VS bringup

@kcudnik
Copy link
Collaborator

kcudnik commented Mar 3, 2024

Netlink is sy chronized in sync each message is processed in synchroonized block under mutex but it should receive all meswges, are you generating food on purpose ? Is any other procesu recdiving all generated messages ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants