Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash: excessive queue length (version v24.11.1) #7972

Open
vincenzopalazzo opened this issue Jan 3, 2025 · 3 comments
Open

crash: excessive queue length (version v24.11.1) #7972

vincenzopalazzo opened this issue Jan 3, 2025 · 3 comments
Assignees
Milestone

Comments

@vincenzopalazzo
Copy link
Contributor

Issue and Steps to Reproduce

➜  ~ cat .lightning/bitcoin/log.log | grep BROKEN
2025-01-01T15:19:23.414Z **BROKEN** connectd: excessive queue length (version v24.11.1)
2025-01-01T15:19:23.414Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x55880cd3663e
2025-01-01T15:19:23.414Z **BROKEN** connectd: backtrace: common/msg_queue.c:69 (do_enqueue) 0x55880cd3ddee
2025-01-01T15:19:23.414Z **BROKEN** connectd: backtrace: common/msg_queue.c:85 (msg_enqueue) 0x55880cd3de1b
2025-01-01T15:19:23.414Z **BROKEN** connectd: backtrace: common/daemon_conn.c:161 (daemon_conn_send) 0x55880cd36bdf
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: connectd/multiplex.c:685 (handle_gossip_in) 0x55880cd2f1fa
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: connectd/multiplex.c:824 (handle_message_locally) 0x55880cd301b2
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: connectd/multiplex.c:1173 (read_body_from_peer_done) 0x55880cd3024d
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x55880cdd0c7e
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x55880cdd1109
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x55880cdd11c2
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x55880cdd2b0f
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: connectd/connectd.c:2564 (main) 0x55880cd2b084
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7f9d3038dd79
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0x55880cd20eb9
2025-01-01T15:19:23.415Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff
➜  ~

Running the tagged version v24.11.1

@endothermicdev
Copy link
Collaborator

Interesting. It looks like there was a significant backlog of gossip messages queued up for gossipd to process and the peer was feeding them faster than we could handle. Is there anything else interesting about this config or machine that might be relevant?
Maybe with an increased number of gossipers, the connectd -> gossipd queue should also be increased. Still, with an average gossip message size of maybe 350bytes, that would be about 700 unprocessed messages by the time this limit was triggered!

@vincenzopalazzo
Copy link
Contributor Author

Interesting. It looks like there was a significant backlog of gossip messages queued up for gossipd to process and the peer was feeding them faster than we could handle. Is there anything else interesting about this config or machine that might be relevant?

Not really all defaults!

@rustyrussell
Copy link
Contributor

rustyrussell commented Jan 20, 2025

Note: this is NOT a crash! This is just to notify us (it only gets fired once, and is harmless), so thanks for the report!

It literally means we have 250,000 gossip messages pending to gossipd. That's a lot of gossip! It could be that we asked many peers for all their gossip, but gossipd goes through it pretty fast. Is gossipd consuming a lot of cpu? Was the node just recently brought online?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants