-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peers are still unable to reconnect and stuck in a deadlock after IP address change #6128
Comments
Some questions:
|
No logs unfortunately. One of the nodes was yalls-tor, but Alex keeps no logs. He said he didn't receive the new IP, either due to gossip not getting through ore rate limiting. He had to manually reconnect after more than 40 hours of being stuck. I had to manually reconnect some clearnet nodes as well and some tor nodes did reconnect rather quickly. So the issue might be that the new IP just doesn't get to every node. But why is that the case? I restarted my node and was hoping this way the new IP would propagate again but it didn't help so far. |
yeah sounds like this might be the issue. Could you perhaps check with a few more of the peers that are not connecting back to you if they also did not get the IP addr update? then we can be sure that this is more of a reliable-gossip-propagation issue rather than a reconnection issue. |
the ones I have contact information are already connected and the others I have no way of contacting them unfortunately. But there seems to be very widespread issues with LNDs ability to receive gossip. https://amboss.space/ isn't able to show recently or even long time ago closed channels, since the gossip doesn't get through. Channels are often marked as disabled for days, even though they are perfectly healthy due to LND apparently rate limiting so extreme that the enabled message can't get through : #6000 (comment) And now even new IP announcement doesn't propagate properly. Why all these issues with gossip? |
Some implementations throttle gossip pretty aggressively (something like 3 updates per node for 24 hrs or something like that), we also throttle as well on a burst level, but only dedicate a few connections at a time to be "active" syncners. You may want to try increasing the number of active syncers (
This in theory addresses things assuming they're actually getting the new node announcements. |
Did you observe this for only the tor peers? |
Are you running with hosts specified in |
there were a handful of clearnet peers, that were disconnected as well right after the IP address change, but I manually connected them right away. They might have reconnected within a few hours by them self. Can't tell in retrospect.
not exactly sure what you mean but I'm announcing my new IP via |
I found a workaround for the issue: changing the alias seems to trigger a gossip message and around 30 disconnected peers (TOR or otherwise) were finally able to reconnect. Just a handful is still stuck. This means something is truly not going right with the IP announcement itself, though. |
I'll contribute another data point to this story. My node's IP address changed two days ago. Of my 1166 peers with whom I have active channels, only 361 have reconnected, with 805 still remaining unconnected. 1ML shows my new IP address, but Amboss still shows my old address. And, strangely, Amboss says it observed a change in my address yesterday, yet it still shows the old address. Do node announcements carry timestamps? Is it possible that stale announcements are overwriting fresher announcements? |
@whitslack I assume you updated your IP in the config and restarted your node already? did you also try the alias change trick (another restart required)? |
My node is C-Lightning, and I specified my host's dynamic DNS name in the config. Yes, I did restart the node to pick up the change in address, and
I did not. I don't want to change my node's alias, and besides, the problem is not that my node didn't generate a new node announcement, as some nodes like 1ML have seen the new announcement. There does seem to be a problem in propagating the new announcement to all nodes in the network, however. |
I had exactly the same issue. Node announcement went out, 130 of 200 peers got the new IP within 24 hours as did 1ml. Yalls and amboss got it eventually but lightningnetwork.plus and around 30 TOR peers didn't get it even after 5 days or more. Kinda didn't want to change my alias either but ended up going from SilentBob to SilentBob! and surprise, surprise, almost all TOR peers were able to reconnect right away. lightningnetwork.plus also finally got the new IP. So I'd say it's worth a shot as long as LND hasn't fixed the underlying issue. It's also an option to change the alias back again right away or after a short while... |
@viaj3ro: I'll change the color instead. Should have the same effect, I'd imagine. |
@whitslack did it work? |
@viaj3ro: Mixed results. I'm now up to 432 connected channel peers, with 729 disconnected channel peers still to go. Amboss now shows my new IP address, citing today's date as the date of the change, yet that change actually still shows my old color, so it would actually represent the change from two days ago. Amboss still doesn't see my new color. 1ML doesn't see my new color yet either. |
@whitslack have a look https://lightningnetwork.plus/ If you have telegram, we can takes this conversation there and not clutter this thread even more: https://t.me/viaj3ro |
FWIW, I opened a similar issue in the C-Lightning issue tracker. However, this may turn out to be caused by a fundamental design problem that affects all Lightning implementations. |
I have a similar problem (LND /etc/hosts file:
lnd.conf (there is no
The problem is that when I transfer the LND instance to another server and run it with these settings, the server instance itself sees its own IP ( I come to the conclusion that the Because of this, of course, I have a lot of non-connected peers, because they are trying to connect using the old IP. P.S. But I want to add that some nodes that I moved to another server and on which I also changed the The last server that still has this problem - https://amboss.space/node/03d37fca0656558de4fd86bbe490a38d84a46228e7ec1361801f54f9437a18d618 lnd-22 sees this as:
but the lnd-02 (old) now is lnd-25 and has (I moved it and started it more than 24 hours ago, even restarted it, in the hope that it would announce its new IP address):
And as you can see I changed an Alias of node too and restarted some times... UPDATE (from 2022-12-02): On lnd-25 (old lnd-02) now in config there is one day as:
The server was stopped many times and started again, the The lnd-25 / old lnd-02 now is:
|
Unfortunately #5377 still persists.
I had another IP address change on my node and around 25 of my tor peers are still disconnected after 40 hours. The new IP was announced right after the change and 1ml picked it up around 15-30 minutes later.
I was able to verify that a handful of those peers are running LND versions that already implemented #5538 which should've solved the issue but apparently it didn't.
The text was updated successfully, but these errors were encountered: