Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peers are still unable to reconnect and stuck in a deadlock after IP address change #6128

Closed
viaj3ro opened this issue Jan 2, 2022 · 19 comments · Fixed by #7239
Closed

Peers are still unable to reconnect and stuck in a deadlock after IP address change #6128

viaj3ro opened this issue Jan 2, 2022 · 19 comments · Fixed by #7239

Comments

@viaj3ro
Copy link

viaj3ro commented Jan 2, 2022

Unfortunately #5377 still persists.

I had another IP address change on my node and around 25 of my tor peers are still disconnected after 40 hours. The new IP was announced right after the change and 1ml picked it up around 15-30 minutes later.

I was able to verify that a handful of those peers are running LND versions that already implemented #5538 which should've solved the issue but apparently it didn't.

@ellemouton
Copy link
Collaborator

Some questions:

  • Do you perhaps have some logs from the peers showing that it is in fact a deadlock?
  • Also would be useful to just confirm that the peers are definitely getting the node_announcement update (just to definitely rule that out). ie, if they do a getnodeinfo on your node id, does the new ip addr show up?
  • is it only Tor peers that you are having this issue with? and if so, is it all your tor peers?

@viaj3ro
Copy link
Author

viaj3ro commented Jan 2, 2022

No logs unfortunately. One of the nodes was yalls-tor, but Alex keeps no logs. He said he didn't receive the new IP, either due to gossip not getting through ore rate limiting. He had to manually reconnect after more than 40 hours of being stuck.

I had to manually reconnect some clearnet nodes as well and some tor nodes did reconnect rather quickly.

So the issue might be that the new IP just doesn't get to every node. But why is that the case? I restarted my node and was hoping this way the new IP would propagate again but it didn't help so far.

@ellemouton
Copy link
Collaborator

So the issue might be that the new IP just doesn't get to every node

yeah sounds like this might be the issue. Could you perhaps check with a few more of the peers that are not connecting back to you if they also did not get the IP addr update? then we can be sure that this is more of a reliable-gossip-propagation issue rather than a reconnection issue.

@viaj3ro
Copy link
Author

viaj3ro commented Jan 3, 2022

the ones I have contact information are already connected and the others I have no way of contacting them unfortunately.

But there seems to be very widespread issues with LNDs ability to receive gossip. https://amboss.space/ isn't able to show recently or even long time ago closed channels, since the gossip doesn't get through.

Channels are often marked as disabled for days, even though they are perfectly healthy due to LND apparently rate limiting so extreme that the enabled message can't get through : #6000 (comment)

And now even new IP announcement doesn't propagate properly.

Why all these issues with gossip?

@Roasbeef
Copy link
Member

Roasbeef commented Jan 3, 2022

Why all these issues with gossip?

Some implementations throttle gossip pretty aggressively (something like 3 updates per node for 24 hrs or something like that), we also throttle as well on a burst level, but only dedicate a few connections at a time to be "active" syncners.

You may want to try increasing the number of active syncers (--numgraphsyncpeers).

I was able to verify that a handful of those peers are running LND versions that already implemented #5538 which should've solved the issue but apparently it didn't.

This in theory addresses things assuming they're actually getting the new node announcements.

@Roasbeef
Copy link
Member

Roasbeef commented Jan 3, 2022

I had another IP address change on my node and around 25 of my tor peers are still disconnected after 40 hours.

Did you observe this for only the tor peers?

@Roasbeef
Copy link
Member

Roasbeef commented Jan 3, 2022

I had another IP address change on my node and around 25 of my tor peers are still disconnected after 40 hours.

Are you running with hosts specified in --externalhosts? That's the only way we'll detect that your IP changed. Otherwise, we don't have a way to detect such changes, particularly if you're behind a NAT, without additional protocol extensions, like: lightning/bolts#917

@viaj3ro
Copy link
Author

viaj3ro commented Jan 9, 2022

Did you observe this for only the tor peers?

there were a handful of clearnet peers, that were disconnected as well right after the IP address change, but I manually connected them right away. They might have reconnected within a few hours by them self. Can't tell in retrospect.

Are you running with hosts specified in --externalhosts? That's the only way we'll detect that your IP changed. Otherwise, we don't have a way to detect such changes, particularly if you're behind a NAT, without additional protocol extensions, like: lightning/bolts#917

not exactly sure what you mean but I'm announcing my new IP via eclair.server.public-ips = [31.17.64.33] and 1ml and most of my tor peers are picking up on it. Around 15 tor peers do not, though and are still disconnected.

@HannahMR HannahMR added the P2 should be fixed if one has time label Jan 17, 2022
@viaj3ro
Copy link
Author

viaj3ro commented Feb 9, 2022

I found a workaround for the issue: changing the alias seems to trigger a gossip message and around 30 disconnected peers (TOR or otherwise) were finally able to reconnect. Just a handful is still stuck.

This means something is truly not going right with the IP announcement itself, though.

@whitslack
Copy link

I'll contribute another data point to this story. My node's IP address changed two days ago. Of my 1166 peers with whom I have active channels, only 361 have reconnected, with 805 still remaining unconnected. 1ML shows my new IP address, but Amboss still shows my old address. And, strangely, Amboss says it observed a change in my address yesterday, yet it still shows the old address.

Do node announcements carry timestamps? Is it possible that stale announcements are overwriting fresher announcements?

@viaj3ro
Copy link
Author

viaj3ro commented Feb 10, 2022

@whitslack I assume you updated your IP in the config and restarted your node already? did you also try the alias change trick (another restart required)?

@whitslack
Copy link

I assume you updated your IP in the config already?

My node is C-Lightning, and I specified my host's dynamic DNS name in the config. Yes, I did restart the node to pick up the change in address, and lightning-cli getinfo does show the new IP address.

did you also try the alias change trick?

I did not. I don't want to change my node's alias, and besides, the problem is not that my node didn't generate a new node announcement, as some nodes like 1ML have seen the new announcement. There does seem to be a problem in propagating the new announcement to all nodes in the network, however.

@viaj3ro
Copy link
Author

viaj3ro commented Feb 10, 2022

I did not. I don't want to change my node's alias, and besides, the problem is not that my node didn't generate a new node announcement, as some nodes like 1ML have seen the new announcement. There does seem to be a problem in propagating the new announcement to all nodes in the network, however.

I had exactly the same issue. Node announcement went out, 130 of 200 peers got the new IP within 24 hours as did 1ml. Yalls and amboss got it eventually but lightningnetwork.plus and around 30 TOR peers didn't get it even after 5 days or more.

Kinda didn't want to change my alias either but ended up going from SilentBob to SilentBob! and surprise, surprise, almost all TOR peers were able to reconnect right away. lightningnetwork.plus also finally got the new IP. So I'd say it's worth a shot as long as LND hasn't fixed the underlying issue.

It's also an option to change the alias back again right away or after a short while...

@whitslack
Copy link

@viaj3ro: I'll change the color instead. Should have the same effect, I'd imagine.

@viaj3ro
Copy link
Author

viaj3ro commented Feb 10, 2022

@whitslack did it work?

@whitslack
Copy link

whitslack commented Feb 10, 2022

did it work?

@viaj3ro: Mixed results. I'm now up to 432 connected channel peers, with 729 disconnected channel peers still to go. Amboss now shows my new IP address, citing today's date as the date of the change, yet that change actually still shows my old color, so it would actually represent the change from two days ago. Amboss still doesn't see my new color. 1ML doesn't see my new color yet either.

@viaj3ro
Copy link
Author

viaj3ro commented Feb 11, 2022

@whitslack have a look https://lightningnetwork.plus/
If they don't have your new IP, I'd say the color changing scheme didn't work. But it's also possible, that you have to wait 24h, since some notes have very restrictive anti spam measures.

If you have telegram, we can takes this conversation there and not clutter this thread even more: https://t.me/viaj3ro

@whitslack
Copy link

FWIW, I opened a similar issue in the C-Lightning issue tracker. However, this may turn out to be caused by a fundamental design problem that affects all Lightning implementations.

@LNBIG-COM
Copy link

LNBIG-COM commented Nov 25, 2022

I have a similar problem (LND v0.15.4-beta). I decided to unify the LND instances with lnd.conf by taking the external IP address outside to the /etc/hosts file. I decided to use externalhosts instead of the externalip option in lnd.conf. Now I am observing the following problem:

/etc/hosts file:

XX.XX.XX.XX ip-external

lnd.conf (there is no externalip option):

externalhosts=ip-external:9735
listen=ip-external:9735

The problem is that when I transfer the LND instance to another server and run it with these settings, the server instance itself sees its own IP (l getinfo shows the uris field as nodeid@ip:port), but it does not announce it to other nodes, and even my other nodes (v0.15.4-beta) have the old IP address in their graph even after a few days.

I come to the conclusion that the externalhosts option does not work at all as intended. It's not that my IP address changes periodically - it's static, but still LND doesn't announce its IP address via gossip to lightning network. Otherwise I can't explain it in any way.

Because of this, of course, I have a lot of non-connected peers, because they are trying to connect using the old IP.

P.S. But I want to add that some nodes that I moved to another server and on which I also changed the externalip option to externalhosts - the network began to see a new IP address pretty quickly. That is, I have a feeling that sometimes it works, and sometimes it absolutely does not (for example, I tried restarting LND, but it did not help).

The last server that still has this problem - https://amboss.space/node/03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618

lnd-22 sees this as:

l getnodeinfo 03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618
{
    "node": {
        "last_update": 1668967977,
        "pub_key": "03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618",
        "alias": "LNBIG.com [lnd-02]",
        "addresses": [
            {
                "network": "tcp",
                "addr": "46.229.165.138:9735"
            }
        ],
        "color": "#3399ff",

but the lnd-02 (old) now is lnd-25 and has (I moved it and started it more than 24 hours ago, even restarted it, in the hope that it would announce its new IP address):

l getinfo
{
    "version": "0.15.4-beta commit=v0.15.4-beta",
    "commit_hash": "96fe51e2e5c2ee0c97909499e0e96a3d3755757e",
    "identity_pubkey": "03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618",
    "alias": "LNBIG.com [lnd-25/old-lnd-02]",
    "color": "#3399ff",
    "num_pending_channels": 0,
    "num_active_channels": 79,
    "num_inactive_channels": 205,
    "num_peers": 74,
    "block_height": 764700,
    "block_hash": "000000000000000000042e6c9c936cb0c9459a7bc925c9f17927edf742f57a9f",
    "best_header_timestamp": "1669405467",
    "synced_to_chain": true,
    "synced_to_graph": true,
    "testnet": false,
    "chains": [
        {
            "chain": "bitcoin",
            "network": "mainnet"
        }
    ],
    "uris": [
        "03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618@213.174.156.69:9735"

And as you can see I changed an Alias of node too and restarted some times...

UPDATE (from 2022-12-02): On lnd-25 (old lnd-02) now in config there is one day as:

externalip=213.174.156.69:9735
listen=ip-external:9735
alias=LNBiG.com🇺🇦[lnd-25/lnd-02]

The server was stopped many times and started again, the lncli peers updatenodeannouncement command was submitted manually new IP and Aliases (through removed new values and updated again ones), which showed that updates should happen, but there have been no changes for almost a day! Some of my nodes see the old IP and the old Alias, some see the new IP but the old Alias (i.e. they didn't get my latest update). I used a compilation from your latest version 0.15.5, but all to no avail!

The lnd-25 / old lnd-02 now is:

l getinfo
{
    "version": "0.15.5-beta.rc2 commit=v0.15.5-beta.rc2",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    "commit_hash": "771bc992bc7bcc1983050f2ee63a19183834f67b",
    "identity_pubkey": "03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618",
    "alias": "LNBiG.com🇺🇦[lnd-25/lnd-02]",
^^^^^^^^^^^^^^^^^^^^^^^
    "color": "#3399ff",
    "num_pending_channels": 0,
    "num_active_channels": 87,
                       ^^^^^^^^^^^^^^
    "num_inactive_channels": 196,
                      ^^^^^^^^^^^^^^^^^
    "num_peers": 87,
    "block_height": 765587,
    "block_hash": "00000000000000000004104062a232ee3e7cb593b18ab4f8d77688c08752624c",
    "best_header_timestamp": "1669974128",
    "synced_to_chain": true,
             ^^^^^^^^
    "synced_to_graph": true,
         ^^^^^
    "testnet": false,
    "chains": [
        {
            "chain": "bitcoin",
            "network": "mainnet"
        }
    ],
    "uris": [
        "03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618@213.174.156.69:9735"
                                                  ^^^^^^^^^^^^^^^^^^^^^^
    ],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants