Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onion messages: add some initial rate limiting #1604

Merged
merged 4 commits into from
Aug 29, 2022

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Jul 8, 2022

Based on #1503.

In this PR, we add business logic for checking if a peer's outbound buffer has room for onion messages, and if so pulls a number of them from an implementer of a new trait, OnionMessageProvider.

This may take some work to land, so we separate out its changes from the rest of the steps needed before the onion message module can be made public.

See commit message for more details.

Blocked on #1660

Based on #1683

lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace changed the title Onion messages follow-up OMs: hook up Messenger to PeerManager + add reply paths Jul 11, 2022
@valentinewallace valentinewallace mentioned this pull request Jul 11, 2022
17 tasks
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2022

Codecov Report

Merging #1604 (6d08249) into main (39cede6) will decrease coverage by 0.10%.
The diff coverage is 82.92%.

❗ Current head 6d08249 differs from pull request most recent head ba8dfad. Consider uploading reports for the commit ba8dfad to get more accurate results

@@            Coverage Diff             @@
##             main    #1604      +/-   ##
==========================================
- Coverage   90.90%   90.80%   -0.11%     
==========================================
  Files          85       85              
  Lines       45888    45813      -75     
  Branches    45888    45813      -75     
==========================================
- Hits        41715    41601     -114     
- Misses       4173     4212      +39     
Impacted Files Coverage Δ
lightning/src/chain/channelmonitor.rs 91.12% <ø> (-0.05%) ⬇️
lightning/src/ln/msgs.rs 85.74% <ø> (-0.56%) ⬇️
lightning/src/ln/wire.rs 61.99% <0.00%> (-0.93%) ⬇️
lightning/src/util/events.rs 39.50% <ø> (ø)
lightning/src/util/macro_logger.rs 85.48% <ø> (ø)
lightning/src/ln/peer_handler.rs 56.76% <54.16%> (-0.07%) ⬇️
lightning-invoice/src/lib.rs 87.39% <87.50%> (ø)
lightning/src/onion_message/messenger.rs 89.53% <94.73%> (ø)
lightning-background-processor/src/lib.rs 95.20% <100.00%> (ø)
lightning-net-tokio/src/lib.rs 77.37% <100.00%> (+0.20%) ⬆️
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -1637,6 +1690,24 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref, CMH: Deref> P

for (descriptor, peer_mutex) in peers.iter() {
self.do_attempt_write_data(&mut (*descriptor).clone(), &mut *peer_mutex.lock().unwrap());

// Only see if we have room for onion messages after we've written all channel messages, to
// ensure they take priority.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I agree that channel messages have a higher priority than onion messages, however if we take the subset of the gossips messages, I'm not sure it's true. Reasoning, onion messages paths are likely to be correlated on payment paths, a routing hop might have an incentive to fasten the traffic of those onions as they might be odds of future routing fees. Gossips are just local host resources consumption, with only loose incentives for the hops (e.g let's say the gossip announced a channel update in this other part of the graph). Just a high level idea not sure it's relevant to introduce now that level of fine-grained class of messages priorities.

@ariard
Copy link

ariard commented Aug 1, 2022

Reply paths are still not added there correct ?

@valentinewallace
Copy link
Contributor Author

Reply paths are still not added there correct ?

Correct, I might add them in a separate PR actually since we tend to be friendlier to merging a lot of small PRs

@valentinewallace valentinewallace changed the title OMs: hook up Messenger to PeerManager + add reply paths OMs: hook up Messenger to PeerManager Aug 2, 2022
@valentinewallace valentinewallace force-pushed the 2022-07-OMs-followup branch 4 times, most recently from 7fe75d4 to 92a818f Compare August 3, 2022 16:09
@valentinewallace
Copy link
Contributor Author

Gonna go ahead and mark this as ready for review and address #1503 (comment) in a dependent follow-up.

@valentinewallace valentinewallace marked this pull request as ready for review August 3, 2022 16:20
@valentinewallace valentinewallace changed the title OMs: hook up Messenger to PeerManager Onion messages: forward OMs in PeerManager Aug 3, 2022
@jkczyz jkczyz self-requested a review August 3, 2022 18:52
valentinewallace added a commit to valentinewallace/rust-lightning that referenced this pull request Aug 4, 2022
Largely, this adds the boilerplate needed for PeerManager and OnionMessenger to
work together on sending and receiving and replaces the stopgaps added in lightningdevkit#1604.
@valentinewallace valentinewallace changed the title Onion messages: forward OMs in PeerManager Onion messages: add some initial rate limiting Aug 4, 2022
valentinewallace added a commit to valentinewallace/rust-lightning that referenced this pull request Aug 4, 2022
Largely, this adds the boilerplate needed for PeerManager and OnionMessenger to
work together on sending and receiving and replaces the stopgaps added in lightningdevkit#1604.
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, basically.

@@ -303,6 +319,10 @@ const OUTBOUND_BUFFER_LIMIT_READ_PAUSE: usize = 10;
/// the peer.
const OUTBOUND_BUFFER_LIMIT_DROP_GOSSIP: usize = OUTBOUND_BUFFER_LIMIT_READ_PAUSE * FORWARD_INIT_SYNC_BUFFER_LIMIT_RATIO;

/// When the outbound buffer has this many messages, we won't poll for new onion messages for this
/// peer.
const OUTBOUND_BUFFER_LIMIT_PAUSE_OMS: usize = 16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooookayyyy, let's see, now that we aren't competing with gossip backfill, let's drop this to 8 - 512KiB per peer was always a bit high. That also means we won't be constantly pausing reading just because we're stuck forwarding some big onion messages. IMO we should also bump the read pause limit to 12 or even 16 to make sure we can still get channel messages out after enqueuing some big gossip messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

IMO we should also bump the read pause limit to 12 or even 16 to make sure we can still get channel messages out after enqueuing some big gossip messages.

Would you like me to add this in this PR? I'll go with 12

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea let's just do it here. 12 sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8261522

lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace force-pushed the 2022-07-OMs-followup branch 2 times, most recently from 0e2ee89 to 652d54a Compare August 16, 2022 18:33
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/wire.rs Show resolved Hide resolved
lightning/src/util/events.rs Outdated Show resolved Hide resolved
/// A trait indicating an object may generate onion message send events
pub trait OnionMessageProvider {
/// Gets up to `max_messages` pending onion messages for the peer with the given node id.
fn next_onion_messages_for_peer(&self, peer_node_id: PublicKey, max_messages: usize) -> Vec<msgs::OnionMessage>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could be called get_next_onion_messages_for_peer(), like we have get_and_clear_pending_msg_events()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd like to move away from this pattern, per rust conventions: https://github.com/rust-lang/rfcs/blob/master/text/0344-conventions-galore.md#gettersetter-apis

lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved

/// Returns the number of onion messages we can fit in this peer's buffer.
fn onion_message_buffer_slots_available(&self) -> usize {
cmp::min(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folks, have you thoughts about making those constants configurable with some PerfConfig to adjust to the node operator resources consumption policy ? Even if you have 1000 channels, 500 MiB of memory that can be too much. Rather to adopt an approach where we assume for X channels you have Y MiB per-outbound buffer, we could devolve the choice to the operator, like you have an global Z onion messages outbound buffer to allocate evenly or not (like first-come, first serve).

Further, I think we should also care about ensuring the channel is well-confirmed or coming from a trusted peers, otherwise I could send you a thousand of OpenChannel messages and never broadcast the funding, if we allocate onion messages outbound bandwidth on the flight that could become a DoS Vector. I think same with dry-in up bandwidth for closed channels.

Could be all follow-ups.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just minor comments though see #1604 (comment)

lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved
lightning/src/ln/peer_handler.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace force-pushed the 2022-07-OMs-followup branch 3 times, most recently from 746cfc9 to 44edece Compare August 22, 2022 17:51
@valentinewallace
Copy link
Contributor Author

FYI, I had to rebase to fix CI.

/// Determines if we should push additional messages onto a peer's outbound buffer for backfilling
/// onion messages and gossip data to the peer. This is checked every time the peer's buffer may
/// have been drained.
fn should_buffer_message_backfill(&self) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm right, now I remember why we were queueing onion messages rather than just draining as required - as currently written we let non-backfill gossip starve onion messages because that is buffered.

I think what we should do is (a) separate the should-gossip-backfill check from the should-om-send check, then (b) track how much of the queue is onion messages, (c) always allow N onion messages in the queue (let's say 4?), and reduce the buffer_full_drop_gossip by the same N.

I think that's basically what we want, anyway, we'll basically reserve N/buffer_full_drop_gossip of our outbound bandwidth for onion messages vs gossip (by message count, by bandwidth itself probably more).

Copy link
Contributor Author

@valentinewallace valentinewallace Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm right, now I remember why we were queueing onion messages rather than just draining as required - as currently written we let non-backfill gossip starve onion messages because that is buffered.

I'm confused, currently we treat OMs ~the same as gossip backfill was treated prior to this change. Does this mean gossip backfilling does not currently work as intended in main?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it means gossip backfill will be starved by new gossip broadcasts, which makes sense - if we drop new gossip broadcasts we won't re-send them, but if we don't send backfill, we'll just send it later. For OMs, we don't want new gossip broadcasts to starve OMs, though we probably want some ratio between the two cause I'm not sure we want OMs to starve new gossip broadcasts either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we only produce new gossip broadcasts when channels open+close, and on timer_tick. So FWIW new gossip broadcasts seem naturally somewhat rate limited, unless the user is specifically spamming the broadcast method

In any case, am I correct that this means we're moving OM sending back to event processing? Because you'd want to only queue->write 1 OM at a time in do_attempt_write_data iiuc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we also produce new gossip broadcasts when we receive new gossip messages from peers - we want to forward those on to our peers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, am I correct that this means we're moving OM sending back to event processing? Because you'd want to only queue->write 1 OM at a time in do_attempt_write_data iiuc

No, I don't think we need to. The above suggestion allows us to keep it in the message flushing logic, it just means we may enqueue an OM even if the buffer is not empty (which gets us a bit back to buffering in PM a bit, but it doesnt matter where we enqueue, really).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of all the methods currently in "Peer",
this method "should_buffer_message_backfill" is the most deceptive.
iiuc, it means more like "can_send_init_sync_gossip" or "should_send_init_sync_gossip" or "should_buffer_init_sync_gossip"

""Determines if we should push additional gossip messages onto a peer's outbound buffer. This is checked every time the peer's buffer may have been drained.
Note: This is only checked for gossip messages resulting from initial sync. For new gossip broadcasts we either send them immediately on receive or drop them. (acc. to should_drop_forwarding_gossip) ""

Please correct it if this understanding is wrong or it can be better renamed.
Also wanted to remove references to backfill, as it was slightly confusing.

Consider renaming "buffer_full_drop_gossip" => "should_drop_forwarding_gossip", buffer is full is implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@G8XSU Could you move this comment to #1683? I separated out this code change to that PR

/// messages). Note that some other lightning implementations time-out connections after some
/// time if no channel is built with the peer.
/// Constructs a new `PeerManager` with the given `RoutingMessageHandler`. No channel message
/// handler or onion message handler is used and onion and channel messages will be ignored (or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would it "(or generate error messages)" If it's an ignoringMsgHandler.
In my brief glance of code, it shouldn't be throwing errors ever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chan_handler is an ErroringMessageHandler below

self.enqueue_message(peer, &msg);
peer.sync_status = InitSyncTracker::NodesSyncing(msg.contents.node_id);
} else {
peer.sync_status = InitSyncTracker::NoSyncRequested;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this case be unreachable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this, #1604 (comment) and #1604 (comment) are pre-existing I'd rather not address them in this PR to keep it small in scope. Maybe we should open an issue for them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely, if you provide your thoughts on those things,
then i can go ahead and make those minor changes. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly github won't let me see what line this originally referred to. However, I think all the changes you suggested seem reasonable if you want to PR them, and we can discuss further on the PR :)

self.enqueue_message(peer, &msg);
peer.sync_status = InitSyncTracker::NodesSyncing(msg.contents.node_id);
} else {
peer.sync_status = InitSyncTracker::NoSyncRequested;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming NoSyncRequested => NoSyncRequired / NoAdditionalSyncRequired
We are currently using it for two purpose:

  1. No Sync Requested
  2. Initial Sync is complete, no more sync required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems reasonable to me

} else {
match peer.sync_status {
InitSyncTracker::NoSyncRequested => {},
InitSyncTracker::ChannelsSyncing(c) if c < 0xffff_ffff_ffff_ffff => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming
c => scid_sync_progress / channel_sync_progress / scid_sync_tracker or something better

/// Determines if we should push additional messages onto a peer's outbound buffer for backfilling
/// onion messages and gossip data to the peer. This is checked every time the peer's buffer may
/// have been drained.
fn should_buffer_message_backfill(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of all the methods currently in "Peer",
this method "should_buffer_message_backfill" is the most deceptive.
iiuc, it means more like "can_send_init_sync_gossip" or "should_send_init_sync_gossip" or "should_buffer_init_sync_gossip"

""Determines if we should push additional gossip messages onto a peer's outbound buffer. This is checked every time the peer's buffer may have been drained.
Note: This is only checked for gossip messages resulting from initial sync. For new gossip broadcasts we either send them immediately on receive or drop them. (acc. to should_drop_forwarding_gossip) ""

Please correct it if this understanding is wrong or it can be better renamed.
Also wanted to remove references to backfill, as it was slightly confusing.

Consider renaming "buffer_full_drop_gossip" => "should_drop_forwarding_gossip", buffer is full is implementation detail

Comment on lines 759 to 769
let next_onion_message_opt = if let Some(peer_node_id) = peer.their_node_id {
self.message_handler.onion_message_handler.next_onion_message_for_peer(peer_node_id)
} else { None };

// For now, let onion messages starve gossip.
if let Some(next_onion_message) = next_onion_message_opt {
self.enqueue_message(peer, &next_onion_message);
} else {
match peer.sync_status {
InitSyncTracker::NoSyncRequested => {},
InitSyncTracker::ChannelsSyncing(c) if c < 0xffff_ffff_ffff_ffff => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some way to abstract this into send_om and then send_init_sync_gossip (if no om was sent)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think of the latest version, sorry for all the churn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it looks much cleaner now :)

@valentinewallace
Copy link
Contributor Author

Now based on #1683

TheBlueMatt
TheBlueMatt previously approved these changes Aug 25, 2022
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling the surprising prereqs here!

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a few minor comments

lightning/src/ln/wire.rs Outdated Show resolved Hide resolved
lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved
lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved
Adds the boilerplate needed for PeerManager and OnionMessenger to work
together, with some corresponding docs and misc updates mostly due to the
PeerManager public API changing.
...to make sure we can still get channel messages out after enqueuing some big gossip messages.
In this commit, we check if a peer's outbound buffer has room for onion
messages, and if so pulls them from an implementer of a new trait,
OnionMessageProvider.

Makes sure channel messages are prioritized over OMs, and OMs are prioritized
over gossip.

The onion_message module remains private until further rate limiting is added.
@jkczyz jkczyz merged commit 99123cd into lightningdevkit:main Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants