Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server.go: add peerChan to peerConnectedListeners in NotifyWhenOnline #6892

Merged

Conversation

Crypt-iQ
Copy link
Collaborator

@Crypt-iQ Crypt-iQ commented Sep 6, 2022

This fixes a bug where a caller would:

  • call NotifyWhenOnline and pass in a channel
  • the server's mutex is held and the pubStr is checked in peersByPub
  • the peer object is in the peersByPub map
  • the peer object has its quit channel closed
  • QuitSignal is called in the select statement, and the function returns
  • the caller is still waiting for the channel to be sent on forever since it was never sent on or added to the peerConnectedListeners slice.

This patch fixes the above bug by adding the channel to the peerConnectedListeners slice if the QuitSignal select case is called.

Fixes #6882

An integration test that makes the bug pop up once in a while is here: https://github.com/Crypt-iQ/lnd/tree/iss6882_notifywhenonline

  • The bug has occurred when there is a "peer quit signal" log soon after a "Calling NotifyWhenOnline" log and there is no "successfully added" log. If the bug has occurred, you should see that the fundinglocked message is received in the logs, but the state machine doesn't advance to adding the channel to the router graph

// The peer quit so we'll add the channel to the slice
// and return.
s.mu.Lock()
s.peerConnectedListeners[pubStr] = append(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I think this might also be affecting reliable direct sends of channel update messages at times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be ideal for us to also add a unit test here. Most of the state here can be pretty easily subbed out (#5700 should make this easier in the future).

So in this case, the test would add a canned peer to peersByPub (which doesn't seem to use the main peer interface rn actually), then fire either of the signals to make it enter this new case. A follow up block would then just call peerInitializer which'll then send the signals as normal.

Thoughts?

Copy link
Collaborator Author

@Crypt-iQ Crypt-iQ Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be done easily - peerInitializer needs a server object and eventually calls p.Start() which requires a channeldb and launches goroutines that require other things like a mocked connection. It would be easier if peersByPub used an interface instead of peer.Brontide which would let us precisely control this. Another issue is that any time peer.Brontide is started successfully, the ActiveSignal is fired on, meaning that this test using peerInitializer will have both the ActiveSignal and the QuitSignal active in that select statement, so it could lead to not exercising the quit logic all the time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5700 doesn't change peersByPub and the other related maps to an interface, but I think that would be the way to go to test this.

Also independent of this bug fix, some of the callsites of NotifyWhenOnline don't retry if a later call to SendMessage failed. This is problematic since the peer could have gone down by the time you end up sending the message.

server.go Outdated Show resolved Hide resolved
@Crypt-iQ Crypt-iQ force-pushed the iss6882_notifywhenonline_fix branch from 0c0320a to 296ac01 Compare September 7, 2022 15:09
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦄

@yyforyongyu yyforyongyu self-requested a review September 14, 2022 09:56
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@guggero
Copy link
Collaborator

guggero commented Sep 20, 2022

Needs a rebase.

@Crypt-iQ
Copy link
Collaborator Author

what do i rebase on? 01cd1c9 (which modified the release notes) looks like it's based on the tlv pr which should be in 0.16.0, not 0.15.2

@guggero
Copy link
Collaborator

guggero commented Sep 20, 2022

Always on master. PRs for minor releases are always cherry-picked onto a branch.

This fixes a bug where a caller would:
- call NotifyWhenOnline and pass in a channel
- the server's mutex is held and the pubStr is checked in peersByPub
- the peer object is in the peersByPub map
- the peer object has its quit channel closed
- QuitSignal is called in the select statement, and the function
  returns
- the caller is still waiting for the channel to be sent on forever
  since it was never sent on or added to the peerConnectedListeners
  slice.

This patch fixes the above bug by adding the channel to the
peerConnectedListeners slice if the QuitSignal select case is called.
@Crypt-iQ Crypt-iQ force-pushed the iss6882_notifywhenonline_fix branch from a764600 to 02f7fa9 Compare September 21, 2022 14:52
@Crypt-iQ
Copy link
Collaborator Author

rebased

@guggero guggero merged commit beb897d into lightningnetwork:master Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Channel need LND re-start to announce to graph.
5 participants