Add peer ban list to Fluffy #2809

kdeme · 2024-10-31T10:54:22Z

Nodes that offer invalid data will currently be allowed to keep doing so wasting bandwidth and cpu of the receiving node.

Any offer that has a properly encoded content key and that falls in the data radius of the receiving node will be accepted.
Data will be send and once received it will be validated. If the data fails validation, the peer currently is not punished and thus allowed to keep offering this or other invalid data.

Similary a node can provide invalid data on a FindContent. Or send no data at all or very slow until uTP transfers time out.

We should add a temporary ban list for nodes that do this.

There might also be nodes that end up in the DHT of a specific subnetwork for some reason, but don't actually support the subnetwork. Might require a temporary (per network) ban list for these to avoid trying to ping over and over a node that is not supported.

kdeme · 2024-11-06T10:15:09Z

We currently see a lot of following errors:

WRN 2024-11-06 11:11:29.853+01:00 Offer failed due to accept request failure  topics="portal_wire" error="No message data, peer might not support this talk protocol" contentKeys=@[0x00458ff92094a00fac6e393a435d58242c4971e9711adf1cd16215fc97856e4a1e] node=b7*406c26:164.92.247.243:5008

This is definitely a good candidate as reason for subnetwork specific temporary bans. As it really indicate a node that does not support a subnetwork.
In theory these should get filtered out of the subnetwork DHT, but if other clients don't do that properly or if the node does reply to ping/pongs on that subnetwork, this will not occur.

bhartnett · 2024-11-26T05:44:38Z

@kdeme I remember you mentioned that there is an existing ban list implementation in nimbus-eth2 somewhere but I can't seem to find the link (if you shared it previously). Can you share it here when you get a chance?

bhartnett · 2024-11-26T05:54:03Z

How long should we ban nodes that are misbehaving?

So we should ban nodes for a period of time when they:

Send offers that fail to validate (conteny key or value)
Return responses to findContent that fails to validate
Fails to respond or sends data too slow and hits timeout in response to pings, lookups and offers.
Responds with empty or bad data to pings, lookups and offers.

Does this cover it or are there other scenarios that need to be covered?

kdeme · 2024-11-26T11:20:54Z

How long should we ban nodes that are misbehaving?

I think we might have different timeouts depending on the reason of the ban. But nodes that are simply incompatible should be banned for several hours if not a day. Nodes that have many timeouts perhaps less, as maybe the node/machine was just "busy" for a bit.

Does this cover it or are there other scenarios that need to be covered?

That's a good start, we will probably notice more as we go through the error statements in the code.
One that I know of that is important is when a node responds with an empty talkresp response message, as this means that node does not support that specific subprotocol, so it should not be contacted for that specific subprotocol. This also means that the ban list should be subnetwork/subprotocol specific.

@kdeme I remember you mentioned that there is an existing ban list implementation in nimbus-eth2 somewhere but I can't seem to find the link (if you shared it previously). Can you share it here when you get a chance?

eth2_network.nim has this seenTable which gets checked before dialing a peer: https://github.com/status-im/nimbus-eth2/blob/stable/beacon_chain/networking/eth2_network.nim#L1358

Nodes stay in that table for different times depending on the failure: https://github.com/status-im/nimbus-eth2/blob/stable/beacon_chain/networking/eth2_network.nim#L235-L251

Our situation is of course a bit different as we don't keep long-lived connections to other peers.
But I think something similar can be used, but then the difference would be where do we check the list. Probably something like:

When selecting nodes to send FindNodes, FindContent or Offer message requests to.
When adding or re-validating a node in our routing_table. Not fully sure how it would fit in here exactly, I would need to have a better look at this.

This would also be something that could potentially be enabled on the level of discv5.

kdeme added Fluffy Security Security vulnerability or security related optimization labels Oct 31, 2024

kdeme mentioned this issue Oct 31, 2024

Don't accept (many?) concurrent offers for the same content key #1920

Open

bhartnett self-assigned this Nov 26, 2024

bhartnett changed the title ~~Add peer ban list to fluffy~~ Add peer ban list to Fluffy Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add peer ban list to Fluffy #2809

Add peer ban list to Fluffy #2809

kdeme commented Oct 31, 2024

kdeme commented Nov 6, 2024

bhartnett commented Nov 26, 2024 •

edited

Loading

bhartnett commented Nov 26, 2024

kdeme commented Nov 26, 2024

Add peer ban list to Fluffy #2809

Add peer ban list to Fluffy #2809

Comments

kdeme commented Oct 31, 2024

kdeme commented Nov 6, 2024

bhartnett commented Nov 26, 2024 • edited Loading

bhartnett commented Nov 26, 2024

kdeme commented Nov 26, 2024

bhartnett commented Nov 26, 2024 •

edited

Loading