Skip to content

Gossip list not being cleared #2089

@william-swarmbotics

Description

@william-swarmbotics

Describe the bug

On a Zenoh network doing peer-to-peer multicast scouting with gossip enabled, we experienced sudden, rapid degradation in Zenoh's ability to communicate, until it generally could not deliver messages at all. We discovered that the OAM message had become very large, in the tens of kilobytes, and had thousands of peer IDs listed. This caused connections to drop because the OAM message was sent blocking, and if it failed to get delivered (or blocked other blocking messages), Zenoh would close the connection. Our use case involves some long-running Zenoh sessions, and we believe the gossip subsystem was remembering all the Zenoh peers that the network had ever seen.

The only mechanism I see for peers to get removed from the gossip subsystem is in gossip::Network::remove_link(), which can be called in close_face(), but if I understand correctly, that could only remove directly connected peers and not peers heard indirectly. It seems that even with multihop disabled, peers still pass indirectly heard IDs, just not their locators (based on propagate_locators()). This explains why the OAM message became so large despite multihop being off.

Possible solutions might be to clear indirectly heard IDs, perhaps with a time-based expiration, or to not gossip anything about indirectly heard peers when multihop is disabled.

To reproduce

Create two Zenoh nodes A and B doing peer-to-peer multicast discovery with gossip enabled and multihop disabled. Their IDs should be random rather than fixed.
Alternate between restarting A and B, allowing discovery to succeed after each restart.
Monitor the number of peer IDs in the OAM message. It should grow indefinitely.

System info

  • Ubuntu 22.04
  • Zenoh 1.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions