Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip propagation issues and gossip map uncompleted #7995

Open
vincenzopalazzo opened this issue Jan 17, 2025 · 6 comments
Open

Gossip propagation issues and gossip map uncompleted #7995

vincenzopalazzo opened this issue Jan 17, 2025 · 6 comments

Comments

@vincenzopalazzo
Copy link
Contributor

I am not sure why I am the first to report this, but with the ocean node where we are running an old but stable release of Core Lightning, version v24.05. However, looks like the gossip is pretty unreliable (much worse that the well-known 'unreliable' behaviour of gossip in Lightning).

As you can see from the mempool site, our node's last update was 5 months ago, but this is not possible because we are paying miners once a day.

I am pretty sure that this issue also occurs with the most recent version because most users of Ocean are using the Start9/Umbrel package, so they are running either the latest version or just one version older.

Problem Statement

Ocean uses BOLT12 (without a blinded path) for every miner to dispatch the payouts. Some users have the following setup:

Ocean Node -> Exit Node (no LSP) -> ... <-> Random Node <-> Miner Node

In this case, the miner has a single channel and relies on it for liquidity. At this point, some miners (who have been in the network for a while and have been mining from Ocean and running a Lightning node since last year) are not able to receive payments anymore because our node cannot connect directly to the miner node (our gossip map does not contain the miner’s address because CLN drops it at some point).

To solve this problem, we tell the miner to connect with our node, but this is a hack, and we think that this issue with gossip can also degrade the pay performance with some nodes that are not ‘well known’.

This issue is also noted by other implementation like ldk Gossip never seems to get a full take from Eclair/CLN

P.S: I could not try with a recent version of CLN there is this issue #7972 that worries me about upgrading.

@grubles
Copy link
Contributor

grubles commented Jan 17, 2025

I'm running master and Mempool is reporting the node's last update was 2 months ago (Nov. 2024).

@xNDTyf
Copy link

xNDTyf commented Jan 18, 2025

@rustyrussell
Copy link
Contributor

OK, so this "5 months" is because the node announcement hasn't changed in that long. That is consistent with what my node sees for your node:

 lightning-cli listnodes 029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4
{
   "nodes": [
      {
         "nodeid": "029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4",
         "alias": "🌊 OCEAN MINING, SA de CV 🌊",
         "color": "02bf81",
         "last_timestamp": 1723818353,
         "features": "88a0800a8a59a1",
         "addresses": [
            {
               "type": "ipv4",
               "address": "16.63.81.71",
               "port": 9735
            },
            {
               "type": "torv3",
               "address": "voibgcjsapdylerigku4gdpmu6sdb5x32b4p3bddtzr52endivdacoad.onion",
               "port": 9735
            }
         ]
      }
   ]
}
$ $ date -d @1723818353
Fri Aug 16 14:25:53 UTC 2024

And that information seems to work:

$ lightning-cli connect 029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4
{
   "id": "029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4",
   "features": "08a0800a8a59a1",
   "direction": "out",
   "address": {
      "type": "ipv4",
      "address": "16.63.81.71",
      "port": 9735
   }
}

My own node showed a 2 month old update on mempool.space. I changed the rgb value a little, to test, and am waiting to see how long it takes to propagate.

@rustyrussell
Copy link
Contributor

We have gotten increasingly aggressive on trying to propagate gossip in each release. At this point, the only way to improve it to literally nominate large nodes to connect to, exchange gossip, and disconnect. That's not a good direction :(

Gossipv2 sync actually fixes this, but it needs implementation and rollout of v2, then specing of the sync. So it's not a quick fix.

@whitslack
Copy link
Collaborator

whitslack commented Jan 20, 2025

For as long as I've been using Core Lightning (almost 6 years), I've had an (unsubstantiated) feeling that the gossip mechanism has never worked quite correctly. I don't know whether that's due to flaws inherent in the specification or compromises in CLN's implementation specifically — or maybe everything is working perfectly and the Lightning Network just sucks.

For a long time (maybe still?), CLN severely rate-limited self-generated announcements, which hobbled the ability to implement dynamic channel fees. I really never understood the extreme aversion to propagating gossip in a timely manner. Other unstructured peer-to-peer networks of the past had no problem with propagating thousands of gossip messages per second, so rate-limiting channel announcements to one or two per day seems plain ridiculous. If we had a billion channels in the network, then sure, but we don't have even a million.

Also for a long time, my gossip_store would be renamed to gossip_store.corrupt every time I restarted my node, suggesting that the gossip store was routinely being mishandled on disk. I never understood why CLN "rolled its own" on-disk data structure for the gossip store rather than using some existing, battle-tested, indexed object store library. The policy of "it grows forever until you restart your node to vacuum it" also seems like a symptom of not using an existing production-ready library. I haven't seen the gossip_store.corrupt issue in a while, so I think maybe that particular issue has been fixed, but it still left me lacking confidence in the implementation.

Still to this day, I sometimes experience very suspect behavior related to gossip. I use the Sling plugin to rebalance channels, and sometimes the sling-stats command will report that every rebalance job can find NoRoutes, which is simply wrong. Restarting the plugin doesn't fix it, but restarting my node does. To be fair, I don't believe I have seen this happen on 24.11.x yet, so perhaps the problem has been fixed. Still, what other gossip bugs are yet lurking? I do not believe they have all been found, and I do believe they are still causing my node to get less payment traffic than it would if its announcements were being propagated reliably.

It would be nice if there were a command that would query a specified peer to make sure it knows about each and every one of my node's own announcements. Then I could develop some automation that would gradually crawl the entire network (well, the nodes that can accept incoming connections anyway) to determine how thoroughly my announcements have propagated. This at least would allow getting a handle on the problem, if there even is a problem.

It would also be nice if CLN would provide some indication of when it is sitting on updated announcements, waiting to push them out. Maybe in the listpeerchannels output, add a field for pending_announcements that contains a count of how many self-generated announcements (node and channel) we are holding onto locally but know that the peer does not yet have (because we haven't yet told them). Frankly, I believe this count ought always to be zero (i.e., never sit on any updates), but at least being able to see it would conclusively confirm or deny whether announcement propagation is a potential problem.

Addendum: I forgot to mention, I have also had an experience where I had opened a new channel with a new peer, yet the peer was not returned by listnodes <peer-id>. Maybe this was not CLN's fault, but it was not as though the peer was some virgin new node on the network. Actually, this happens fairly often. [Edit: I'm mixing up anecdotes. What follows is a separate issue.]

Sometimes I start seeing forwards on a new channel, and yet listchannels does not return the channel (or returns only one half-channel). It's very conspicuous to me when this happens because I monitor all successful forwards with a script that prints out the SCIDs and node aliases of the incoming and outgoing channel of each successful forwarded payment as it happens, and sometimes the alias will be missing because my node fails to return the new channel in listchannels. It will work sometime later, but shouldn't it have already made the channel announcement available by the time the channel went live?

Image
Screenshot showing successful forwards on a new channel (highlighted in red) whose node alias could not be resolved because the channel announcement was not being returned by listchannels <scid> at the time the payments occurred.

@RCasatta
Copy link
Contributor

I am using CLN 24.11 and changing fee rates of my channel at most once per day and at most of X%.

I noticed value of channel fee rate on mempool are very different for what my nodes has, but lightningnetwork plus seem to show the value correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants