Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New persistent connection request keeps spawning indefinitely for offline channel peers. #6866

Closed
hazrulnizam opened this issue Aug 28, 2022 · 10 comments
Labels
bug Unintended code behaviour p2p Code related to the peer-to-peer behaviour

Comments

@hazrulnizam
Copy link

Background

My node crashed with too many open files error. Upon inspection, it looked like LND keeps spawning new connection requests, increasing every reconnection attempt cycle until until the system reached its open files limit.

Your environment

  • lnd v0.15.0-beta
  • Linux 4.18.0-372.16.1.el8_6.x86_64
  • bitcoind 23.0

Steps to reproduce

  1. Have a channel peer go offline for an extended amount of time.
  2. Wait.

Expected behaviour

Spawn only one (or a few) reconnection attempt each time.

Actual behaviour

Over time (several days), LND spawns thousands of reconnection attempt at once, using one open file slot each. Given enough time, the open file limits of the system is exceeded and lnd crashes.

Attached is a redacted log of my node a few minutes before crashing. Interesting excerpt:

2022-08-28 20:00:00.312 [WRN] SRVR: Already have 1031 persistent connection requests for <offline_node_pubkey>@<offline_node_tor_address>.onion:9735, connecting anyway.

lnd.log.redacted.log

@yyforyongyu yyforyongyu added bug Unintended code behaviour p2p Code related to the peer-to-peer behaviour labels Aug 29, 2022
@guggero
Copy link
Collaborator

guggero commented Aug 29, 2022

Hmm, I wonder if this would be addressed by #5700?

@Crypt-iQ Crypt-iQ added this to the v0.16.0 milestone Aug 29, 2022
@Crypt-iQ
Copy link
Collaborator

are you running any external scripts / node management software

@hazrulnizam
Copy link
Author

hazrulnizam commented Aug 29, 2022

Yes, I am also running the following on the node:

  1. ThunderHub 0.13.15
  2. RTL 0.12.2-beta
  3. LndHub
  4. lnd-manageJ

Do any of them actually perform active reconnection attempt?

I have stopped running lnd-manageJ for now, after the last crash. But it seems like the number of requests are still increasing since the node restart.

I see why you're asking. In the logs there are a lot of:

[DBG] RPCS: [connectpeer] requested connection to <offline_node_pubkey>@<offline_node_tor_address>.onion:9735

so there is something requesting the connection via RPC?

@Crypt-iQ
Copy link
Collaborator

If you want the issue to stop, you should probably stop running the scripts for now until we fix the issue

@hazrulnizam
Copy link
Author

I will restart the node before I go to sleep tonight, and stop all other scripts. I will then report in the morning if the RPCS connection requests appears in the logs while I sleep.

@Roasbeef
Copy link
Member

Which version of Go did you use to compile the binary? Or was it taken from the release artifacts?

@Roasbeef
Copy link
Member

so there is something requesting the connection via RPC?

Yes, you'll only see that error if a command forces a new connection (the current behavior) while we already have one active.

We track the connection by target public key. We do this (historically) so that if the user has a new IP/onion for the peer, then we'll use that and wipe all the other ones out after one of them succeeds.

@Roasbeef
Copy link
Member

I think our current behavior is correct, but we should start to limit the number of active connections we'll create, even in this override mode.

@hazrulnizam
Copy link
Author

Which version of Go did you use to compile the binary? Or was it taken from the release artifacts?

I am using the pre-compiled binary from the release.

I will restart the node before I go to sleep tonight, and stop all other scripts. I will then report in the morning if the RPCS connection requests appears in the logs while I sleep

I can now confirm that the increase in the number of persistent connections is caused by ThunderHub. It creates 10 new persistent request for re-connection every start of the hour.

@hazrulnizam
Copy link
Author

This is actually a Thunderhub issue, and not an lnd issue. Therefore, I will close it.

@saubyk saubyk removed this from the v0.18.0 milestone Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour p2p Code related to the peer-to-peer behaviour
Projects
None yet
Development

No branches or pull requests

6 participants