Gossip propagation issues and gossip map uncompleted #7995

vincenzopalazzo · 2025-01-17T14:04:45Z

I am not sure why I am the first to report this, but with the ocean node where we are running an old but stable release of Core Lightning, version v24.05. However, looks like the gossip is pretty unreliable (much worse that the well-known 'unreliable' behaviour of gossip in Lightning).

As you can see from the mempool site, our node's last update was 5 months ago, but this is not possible because we are paying miners once a day.

I am pretty sure that this issue also occurs with the most recent version because most users of Ocean are using the Start9/Umbrel package, so they are running either the latest version or just one version older.

Problem Statement

Ocean uses BOLT12 (without a blinded path) for every miner to dispatch the payouts. Some users have the following setup:

Ocean Node -> Exit Node (no LSP) -> ... <-> Random Node <-> Miner Node

In this case, the miner has a single channel and relies on it for liquidity. At this point, some miners (who have been in the network for a while and have been mining from Ocean and running a Lightning node since last year) are not able to receive payments anymore because our node cannot connect directly to the miner node (our gossip map does not contain the miner’s address because CLN drops it at some point).

To solve this problem, we tell the miner to connect with our node, but this is a hack, and we think that this issue with gossip can also degrade the pay performance with some nodes that are not ‘well known’.

This issue is also noted by other implementation like ldk Gossip never seems to get a full take from Eclair/CLN

P.S: I could not try with a recent version of CLN there is this issue #7972 that worries me about upgrading.

The text was updated successfully, but these errors were encountered:

grubles · 2025-01-17T14:37:39Z

I'm running master and Mempool is reporting the node's last update was 2 months ago (Nov. 2024).

xNDTyf · 2025-01-18T11:22:21Z

Amboss says this https://amboss.space/node/029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4

rustyrussell · 2025-01-20T00:40:47Z

OK, so this "5 months" is because the node announcement hasn't changed in that long. That is consistent with what my node sees for your node:

 lightning-cli listnodes 029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4
{
   "nodes": [
      {
         "nodeid": "029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4",
         "alias": "🌊 OCEAN MINING, SA de CV 🌊",
         "color": "02bf81",
         "last_timestamp": 1723818353,
         "features": "88a0800a8a59a1",
         "addresses": [
            {
               "type": "ipv4",
               "address": "16.63.81.71",
               "port": 9735
            },
            {
               "type": "torv3",
               "address": "voibgcjsapdylerigku4gdpmu6sdb5x32b4p3bddtzr52endivdacoad.onion",
               "port": 9735
            }
         ]
      }
   ]
}
$ $ date -d @1723818353
Fri Aug 16 14:25:53 UTC 2024

And that information seems to work:

$ lightning-cli connect 029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4
{
   "id": "029ef2ce43571727104099576c633b2233bfeb8dc18b476f93540a32207da9e9a4",
   "features": "08a0800a8a59a1",
   "direction": "out",
   "address": {
      "type": "ipv4",
      "address": "16.63.81.71",
      "port": 9735
   }
}

My own node showed a 2 month old update on mempool.space. I changed the rgb value a little, to test, and am waiting to see how long it takes to propagate.

rustyrussell · 2025-01-20T00:54:22Z

We have gotten increasingly aggressive on trying to propagate gossip in each release. At this point, the only way to improve it to literally nominate large nodes to connect to, exchange gossip, and disconnect. That's not a good direction :(

Gossipv2 sync actually fixes this, but it needs implementation and rollout of v2, then specing of the sync. So it's not a quick fix.

whitslack · 2025-01-20T01:58:41Z

For as long as I've been using Core Lightning (almost 6 years), I've had an (unsubstantiated) feeling that the gossip mechanism has never worked quite correctly. I don't know whether that's due to flaws inherent in the specification or compromises in CLN's implementation specifically — or maybe everything is working perfectly and the Lightning Network just sucks.

For a long time (maybe still?), CLN severely rate-limited self-generated announcements, which hobbled the ability to implement dynamic channel fees. I really never understood the extreme aversion to propagating gossip in a timely manner. Other unstructured peer-to-peer networks of the past had no problem with propagating thousands of gossip messages per second, so rate-limiting channel announcements to one or two per day seems plain ridiculous. If we had a billion channels in the network, then sure, but we don't have even a million.

Also for a long time, my gossip_store would be renamed to gossip_store.corrupt every time I restarted my node, suggesting that the gossip store was routinely being mishandled on disk. I never understood why CLN "rolled its own" on-disk data structure for the gossip store rather than using some existing, battle-tested, indexed object store library. The policy of "it grows forever until you restart your node to vacuum it" also seems like a symptom of not using an existing production-ready library. I haven't seen the gossip_store.corrupt issue in a while, so I think maybe that particular issue has been fixed, but it still left me lacking confidence in the implementation.

Still to this day, I sometimes experience very suspect behavior related to gossip. I use the Sling plugin to rebalance channels, and sometimes the sling-stats command will report that every rebalance job can find NoRoutes, which is simply wrong. Restarting the plugin doesn't fix it, but restarting my node does. To be fair, I don't believe I have seen this happen on 24.11.x yet, so perhaps the problem has been fixed. Still, what other gossip bugs are yet lurking? I do not believe they have all been found, and I do believe they are still causing my node to get less payment traffic than it would if its announcements were being propagated reliably.

It would be nice if there were a command that would query a specified peer to make sure it knows about each and every one of my node's own announcements. Then I could develop some automation that would gradually crawl the entire network (well, the nodes that can accept incoming connections anyway) to determine how thoroughly my announcements have propagated. This at least would allow getting a handle on the problem, if there even is a problem.

It would also be nice if CLN would provide some indication of when it is sitting on updated announcements, waiting to push them out. Maybe in the listpeerchannels output, add a field for pending_announcements that contains a count of how many self-generated announcements (node and channel) we are holding onto locally but know that the peer does not yet have (because we haven't yet told them). Frankly, I believe this count ought always to be zero (i.e., never sit on any updates), but at least being able to see it would conclusively confirm or deny whether announcement propagation is a potential problem.

Addendum: I forgot to mention, I have also had an experience where I had opened a new channel with a new peer, yet the peer was not returned by listnodes <peer-id>. Maybe this was not CLN's fault, but it was not as though the peer was some virgin new node on the network. ~~Actually, this happens fairly often.~~ [Edit: I'm mixing up anecdotes. What follows is a separate issue.]

Sometimes I start seeing forwards on a new channel, and yet listchannels does not return the channel (or returns only one half-channel). It's very conspicuous to me when this happens because I monitor all successful forwards with a script that prints out the SCIDs and node aliases of the incoming and outgoing channel of each successful forwarded payment as it happens, and sometimes the alias will be missing because my node fails to return the new channel in listchannels. It will work sometime later, but shouldn't it have already made the channel announcement available by the time the channel went live?

Screenshot showing successful forwards on a new channel (highlighted in red) whose node alias could not be resolved because the channel announcement was not being returned by listchannels <scid> at the time the payments occurred.

RCasatta · 2025-01-20T08:09:58Z

I am using CLN 24.11 and changing fee rates of my channel at most once per day and at most of X%.

I noticed value of channel fee rate on mempool are very different for what my nodes has, but lightningnetwork plus seem to show the value correctly.

vincenzopalazzo mentioned this issue Jan 17, 2025

Gossip never seems to get a full take from Eclair/CLN lightningdevkit/rust-lightning#3075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gossip propagation issues and gossip map uncompleted #7995

Gossip propagation issues and gossip map uncompleted #7995

vincenzopalazzo commented Jan 17, 2025

grubles commented Jan 17, 2025

xNDTyf commented Jan 18, 2025

rustyrussell commented Jan 20, 2025

rustyrussell commented Jan 20, 2025

whitslack commented Jan 20, 2025 •

edited

Loading

RCasatta commented Jan 20, 2025

Gossip propagation issues and gossip map uncompleted #7995

Gossip propagation issues and gossip map uncompleted #7995

Comments

vincenzopalazzo commented Jan 17, 2025

Problem Statement

grubles commented Jan 17, 2025

xNDTyf commented Jan 18, 2025

rustyrussell commented Jan 20, 2025

rustyrussell commented Jan 20, 2025

whitslack commented Jan 20, 2025 • edited Loading

RCasatta commented Jan 20, 2025

whitslack commented Jan 20, 2025 •

edited

Loading