-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onion messages: add some initial rate limiting #1604
Onion messages: add some initial rate limiting #1604
Conversation
4036693
to
d5fbc04
Compare
Codecov Report
@@ Coverage Diff @@
## main #1604 +/- ##
==========================================
- Coverage 90.90% 90.80% -0.11%
==========================================
Files 85 85
Lines 45888 45813 -75
Branches 45888 45813 -75
==========================================
- Hits 41715 41601 -114
- Misses 4173 4212 +39
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
lightning/src/ln/peer_handler.rs
Outdated
@@ -1637,6 +1690,24 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref, CMH: Deref> P | |||
|
|||
for (descriptor, peer_mutex) in peers.iter() { | |||
self.do_attempt_write_data(&mut (*descriptor).clone(), &mut *peer_mutex.lock().unwrap()); | |||
|
|||
// Only see if we have room for onion messages after we've written all channel messages, to | |||
// ensure they take priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I agree that channel messages have a higher priority than onion messages, however if we take the subset of the gossips messages, I'm not sure it's true. Reasoning, onion messages paths are likely to be correlated on payment paths, a routing hop might have an incentive to fasten the traffic of those onions as they might be odds of future routing fees. Gossips are just local host resources consumption, with only loose incentives for the hops (e.g let's say the gossip announced a channel update in this other part of the graph). Just a high level idea not sure it's relevant to introduce now that level of fine-grained class of messages priorities.
Reply paths are still not added there correct ? |
Correct, I might add them in a separate PR actually since we tend to be friendlier to merging a lot of small PRs |
7fe75d4
to
92a818f
Compare
Gonna go ahead and mark this as ready for review and address #1503 (comment) in a dependent follow-up. |
PeerManager
Largely, this adds the boilerplate needed for PeerManager and OnionMessenger to work together on sending and receiving and replaces the stopgaps added in lightningdevkit#1604.
92a818f
to
e9b3293
Compare
PeerManager
Largely, this adds the boilerplate needed for PeerManager and OnionMessenger to work together on sending and receiving and replaces the stopgaps added in lightningdevkit#1604.
e9b3293
to
0b4056f
Compare
5a0d6d0
to
5cd92f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, basically.
lightning/src/ln/peer_handler.rs
Outdated
@@ -303,6 +319,10 @@ const OUTBOUND_BUFFER_LIMIT_READ_PAUSE: usize = 10; | |||
/// the peer. | |||
const OUTBOUND_BUFFER_LIMIT_DROP_GOSSIP: usize = OUTBOUND_BUFFER_LIMIT_READ_PAUSE * FORWARD_INIT_SYNC_BUFFER_LIMIT_RATIO; | |||
|
|||
/// When the outbound buffer has this many messages, we won't poll for new onion messages for this | |||
/// peer. | |||
const OUTBOUND_BUFFER_LIMIT_PAUSE_OMS: usize = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooookayyyy, let's see, now that we aren't competing with gossip backfill, let's drop this to 8 - 512KiB per peer was always a bit high. That also means we won't be constantly pausing reading just because we're stuck forwarding some big onion messages. IMO we should also bump the read pause limit to 12 or even 16 to make sure we can still get channel messages out after enqueuing some big gossip messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
IMO we should also bump the read pause limit to 12 or even 16 to make sure we can still get channel messages out after enqueuing some big gossip messages.
Would you like me to add this in this PR? I'll go with 12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's just do it here. 12 sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8261522
0e2ee89
to
652d54a
Compare
lightning/src/util/events.rs
Outdated
/// A trait indicating an object may generate onion message send events | ||
pub trait OnionMessageProvider { | ||
/// Gets up to `max_messages` pending onion messages for the peer with the given node id. | ||
fn next_onion_messages_for_peer(&self, peer_node_id: PublicKey, max_messages: usize) -> Vec<msgs::OnionMessage>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could be called get_next_onion_messages_for_peer()
, like we have get_and_clear_pending_msg_events()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd like to move away from this pattern, per rust conventions: https://github.com/rust-lang/rfcs/blob/master/text/0344-conventions-galore.md#gettersetter-apis
lightning/src/ln/peer_handler.rs
Outdated
|
||
/// Returns the number of onion messages we can fit in this peer's buffer. | ||
fn onion_message_buffer_slots_available(&self) -> usize { | ||
cmp::min( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks, have you thoughts about making those constants configurable with some PerfConfig
to adjust to the node operator resources consumption policy ? Even if you have 1000 channels, 500 MiB of memory that can be too much. Rather to adopt an approach where we assume for X channels you have Y MiB per-outbound buffer, we could devolve the choice to the operator, like you have an global Z onion messages outbound buffer to allocate evenly or not (like first-come, first serve).
Further, I think we should also care about ensuring the channel is well-confirmed or coming from a trusted peers, otherwise I could send you a thousand of OpenChannel
messages and never broadcast the funding, if we allocate onion messages outbound bandwidth on the flight that could become a DoS Vector. I think same with dry-in up bandwidth for closed channels.
Could be all follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor comments though see #1604 (comment)
746cfc9
to
44edece
Compare
FYI, I had to rebase to fix CI. |
lightning/src/ln/peer_handler.rs
Outdated
/// Determines if we should push additional messages onto a peer's outbound buffer for backfilling | ||
/// onion messages and gossip data to the peer. This is checked every time the peer's buffer may | ||
/// have been drained. | ||
fn should_buffer_message_backfill(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm right, now I remember why we were queueing onion messages rather than just draining as required - as currently written we let non-backfill gossip starve onion messages because that is buffered.
I think what we should do is (a) separate the should-gossip-backfill check from the should-om-send check, then (b) track how much of the queue is onion messages, (c) always allow N onion messages in the queue (let's say 4?), and reduce the buffer_full_drop_gossip
by the same N.
I think that's basically what we want, anyway, we'll basically reserve N/buffer_full_drop_gossip of our outbound bandwidth for onion messages vs gossip (by message count, by bandwidth itself probably more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm right, now I remember why we were queueing onion messages rather than just draining as required - as currently written we let non-backfill gossip starve onion messages because that is buffered.
I'm confused, currently we treat OMs ~the same as gossip backfill was treated prior to this change. Does this mean gossip backfilling does not currently work as intended in main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it means gossip backfill will be starved by new gossip broadcasts, which makes sense - if we drop new gossip broadcasts we won't re-send them, but if we don't send backfill, we'll just send it later. For OMs, we don't want new gossip broadcasts to starve OMs, though we probably want some ratio between the two cause I'm not sure we want OMs to starve new gossip broadcasts either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we only produce new gossip broadcasts when channels open+close, and on timer_tick
. So FWIW new gossip broadcasts seem naturally somewhat rate limited, unless the user is specifically spamming the broadcast method
In any case, am I correct that this means we're moving OM sending back to event processing? Because you'd want to only queue->write 1 OM at a time in do_attempt_write_data
iiuc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we also produce new gossip broadcasts when we receive new gossip messages from peers - we want to forward those on to our peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, am I correct that this means we're moving OM sending back to event processing? Because you'd want to only queue->write 1 OM at a time in do_attempt_write_data iiuc
No, I don't think we need to. The above suggestion allows us to keep it in the message flushing logic, it just means we may enqueue an OM even if the buffer is not empty (which gets us a bit back to buffering in PM a bit, but it doesnt matter where we enqueue, really).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of all the methods currently in "Peer",
this method "should_buffer_message_backfill" is the most deceptive.
iiuc, it means more like "can_send_init_sync_gossip" or "should_send_init_sync_gossip" or "should_buffer_init_sync_gossip"
""Determines if we should push additional gossip messages onto a peer's outbound buffer. This is checked every time the peer's buffer may have been drained.
Note: This is only checked for gossip messages resulting from initial sync. For new gossip broadcasts we either send them immediately on receive or drop them. (acc. to should_drop_forwarding_gossip
) ""
Please correct it if this understanding is wrong or it can be better renamed.
Also wanted to remove references to backfill, as it was slightly confusing.
Consider renaming "buffer_full_drop_gossip" => "should_drop_forwarding_gossip", buffer is full is implementation detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// messages). Note that some other lightning implementations time-out connections after some | ||
/// time if no channel is built with the peer. | ||
/// Constructs a new `PeerManager` with the given `RoutingMessageHandler`. No channel message | ||
/// handler or onion message handler is used and onion and channel messages will be ignored (or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would it "(or generate error messages)" If it's an ignoringMsgHandler.
In my brief glance of code, it shouldn't be throwing errors ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chan_handler
is an ErroringMessageHandler
below
lightning/src/ln/peer_handler.rs
Outdated
self.enqueue_message(peer, &msg); | ||
peer.sync_status = InitSyncTracker::NodesSyncing(msg.contents.node_id); | ||
} else { | ||
peer.sync_status = InitSyncTracker::NoSyncRequested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this case be unreachable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since this, #1604 (comment) and #1604 (comment) are pre-existing I'd rather not address them in this PR to keep it small in scope. Maybe we should open an issue for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes definitely, if you provide your thoughts on those things,
then i can go ahead and make those minor changes. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly github won't let me see what line this originally referred to. However, I think all the changes you suggested seem reasonable if you want to PR them, and we can discuss further on the PR :)
lightning/src/ln/peer_handler.rs
Outdated
self.enqueue_message(peer, &msg); | ||
peer.sync_status = InitSyncTracker::NodesSyncing(msg.contents.node_id); | ||
} else { | ||
peer.sync_status = InitSyncTracker::NoSyncRequested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming NoSyncRequested => NoSyncRequired / NoAdditionalSyncRequired
We are currently using it for two purpose:
- No Sync Requested
- Initial Sync is complete, no more sync required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems reasonable to me
lightning/src/ln/peer_handler.rs
Outdated
} else { | ||
match peer.sync_status { | ||
InitSyncTracker::NoSyncRequested => {}, | ||
InitSyncTracker::ChannelsSyncing(c) if c < 0xffff_ffff_ffff_ffff => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming
c => scid_sync_progress / channel_sync_progress / scid_sync_tracker or something better
lightning/src/ln/peer_handler.rs
Outdated
/// Determines if we should push additional messages onto a peer's outbound buffer for backfilling | ||
/// onion messages and gossip data to the peer. This is checked every time the peer's buffer may | ||
/// have been drained. | ||
fn should_buffer_message_backfill(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of all the methods currently in "Peer",
this method "should_buffer_message_backfill" is the most deceptive.
iiuc, it means more like "can_send_init_sync_gossip" or "should_send_init_sync_gossip" or "should_buffer_init_sync_gossip"
""Determines if we should push additional gossip messages onto a peer's outbound buffer. This is checked every time the peer's buffer may have been drained.
Note: This is only checked for gossip messages resulting from initial sync. For new gossip broadcasts we either send them immediately on receive or drop them. (acc. to should_drop_forwarding_gossip
) ""
Please correct it if this understanding is wrong or it can be better renamed.
Also wanted to remove references to backfill, as it was slightly confusing.
Consider renaming "buffer_full_drop_gossip" => "should_drop_forwarding_gossip", buffer is full is implementation detail
lightning/src/ln/peer_handler.rs
Outdated
let next_onion_message_opt = if let Some(peer_node_id) = peer.their_node_id { | ||
self.message_handler.onion_message_handler.next_onion_message_for_peer(peer_node_id) | ||
} else { None }; | ||
|
||
// For now, let onion messages starve gossip. | ||
if let Some(next_onion_message) = next_onion_message_opt { | ||
self.enqueue_message(peer, &next_onion_message); | ||
} else { | ||
match peer.sync_status { | ||
InitSyncTracker::NoSyncRequested => {}, | ||
InitSyncTracker::ChannelsSyncing(c) if c < 0xffff_ffff_ffff_ffff => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there some way to abstract this into send_om and then send_init_sync_gossip (if no om was sent)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think of the latest version, sorry for all the churn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it looks much cleaner now :)
44edece
to
6d08249
Compare
Now based on #1683 |
6d08249
to
ba8dfad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling the surprising prereqs here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo a few minor comments
Adds the boilerplate needed for PeerManager and OnionMessenger to work together, with some corresponding docs and misc updates mostly due to the PeerManager public API changing.
...to make sure we can still get channel messages out after enqueuing some big gossip messages.
In this commit, we check if a peer's outbound buffer has room for onion messages, and if so pulls them from an implementer of a new trait, OnionMessageProvider. Makes sure channel messages are prioritized over OMs, and OMs are prioritized over gossip. The onion_message module remains private until further rate limiting is added.
ba8dfad
to
f4834de
Compare
Based on #1503.In this PR, we add business logic for checking if a peer's outbound buffer has room for onion messages, and if so pulls a number of them from an implementer of a new trait, OnionMessageProvider.
This may take some work to land, so we separate out its changes from the rest of the steps needed before the onion message module can be made public.
See commit message for more details.
Blocked on #1660Based on #1683