Suggestion for improvement of documentation of Bridge Node [documentation] #1633

cryptomolot · 2024-06-27T11:16:52Z

Dear Celestia Foundation,

I'm Vladislav from cryptomolot and we want to provide transparency and clarify the reason for our downtime on 24/06/2024. This is the link to our node.

We are participating in MOCHA-4 testnet and applied for a delegation program.

We were facing this problem: our system with a bridge node installed crashes when LimitNOFILE=1400000 was set due to latest update.

We are using the recommended server requirements: Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-144-generic x86_64). Hardware also meets the requirements (Memory: 16 GB RAM, although we currently have 32 GB) ,CPU: 6 cores (we have 24 cores actually), Disk: 10 TB SSD Storage,Bandwidth: 1 Gbps for Download/1 Gbps for Upload)

We had to reduce this value to LimitNOFILE=65535 to ensure system stability after the latest update.

By setting the soft limit to 1400000 in the service file, we exceeded the hard limit of the system, which I believe caused the crash.

Log from the system

INFO: task ksmd:311 bloc
ked for more than 120 seconds
INFO: task scsi_eh_0:537 blocked for more than 120 seconds
INFO: task jbd2/sda2-8:702 blocked for more than 120 seconds
INFO: task NetworkManager:120743 blocked for more than 120 seconds
INFO: task xfsaild/sdb1:123535 blocked for more than 120 seconds

We don't think disks were the reason. They weren't overloaded (and the hardware in general related to our monitoring). It is likely related to the processor or the operating system. The system has the hard limit file descriptor set to 1048576. When we changed the Soft Limit in the service file, we exceeded the Hard Limit, we think that was the reason why the system crashes.

What could help: It may be worth revising this value in the documentation to a lower setting or providing more detailed system requirements. https://docs.celestia.org/nodes/systemd#celestia-bridge-node

Btw, we checked the 1390 Issue and saw that was already discussed but probably worth mentioning again after our case. Thanks a lot for your time!

Best regards,

Vladislav.

Let's keep in touch! My telegram is t.me/tommmymlt

The text was updated successfully, but these errors were encountered:

jcstein · 2024-07-01T14:08:50Z

gm @cryptomolot please feel free to make a PR to make updates for this issue yourself 🙌

Modify the bridge node guide (LimitNOFILE) according to celestiaorg#1633

Wondertan · 2024-08-01T13:37:09Z

Hey @cryptomolot, I don't think the soft limit of 1400000 should exceed the default system-wide limit unless you change the default. My system's default is 9223372036854775807, which is much more than we recommend for the per-process limit.

Did you change the system-wide default? If so, what for?

P.S You can check your system-wide default via cat /proc/sys/fs/file-max

cryptomolot changed the title ~~Suggestion for improvement of documentation of Bridge Node~~ Suggestion for improvement of documentation of Bridge Node [documentation] Jun 27, 2024

cryptomolot added a commit to cryptomolot/docs that referenced this issue Jul 2, 2024

Update systemd.md

f69f0d2

Modify the bridge node guide (LimitNOFILE) according to celestiaorg#1633

cryptomolot mentioned this issue Jul 2, 2024

Update systemd.md #1639

Open

coderabbitai bot mentioned this issue Aug 5, 2024

docs: update consensus-node page #1662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for improvement of documentation of Bridge Node [documentation] #1633

Suggestion for improvement of documentation of Bridge Node [documentation] #1633

cryptomolot commented Jun 27, 2024

jcstein commented Jul 1, 2024

Wondertan commented Aug 1, 2024

Suggestion for improvement of documentation of Bridge Node [documentation] #1633

Suggestion for improvement of documentation of Bridge Node [documentation] #1633

Comments

cryptomolot commented Jun 27, 2024

jcstein commented Jul 1, 2024

Wondertan commented Aug 1, 2024