You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm Vladislav from cryptomolot and we want to provide transparency and clarify the reason for our downtime on 24/06/2024. This is the link to our node.
We are participating in MOCHA-4 testnet and applied for a delegation program.
We were facing this problem: our system with a bridge node installed crashes when LimitNOFILE=1400000 was set due to latest update.
We are using the recommended server requirements: Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-144-generic x86_64). Hardware also meets the requirements (Memory: 16 GB RAM, although we currently have 32 GB) ,CPU: 6 cores (we have 24 cores actually), Disk: 10 TB SSD Storage,Bandwidth: 1 Gbps for Download/1 Gbps for Upload)
We had to reduce this value to LimitNOFILE=65535 to ensure system stability after the latest update.
By setting the soft limit to 1400000 in the service file, we exceeded the hard limit of the system, which I believe caused the crash.
Log from the system
INFO: task ksmd:311 bloc
ked for more than 120 seconds
INFO: task scsi_eh_0:537 blocked for more than 120 seconds
INFO: task jbd2/sda2-8:702 blocked for more than 120 seconds
INFO: task NetworkManager:120743 blocked for more than 120 seconds
INFO: task xfsaild/sdb1:123535 blocked for more than 120 seconds
We don't think disks were the reason. They weren't overloaded (and the hardware in general related to our monitoring). It is likely related to the processor or the operating system. The system has the hard limit file descriptor set to 1048576. When we changed the Soft Limit in the service file, we exceeded the Hard Limit, we think that was the reason why the system crashes.
Btw, we checked the 1390 Issue and saw that was already discussed but probably worth mentioning again after our case. Thanks a lot for your time!
Best regards,
Vladislav.
Let's keep in touch! My telegram is t.me/tommmymlt
The text was updated successfully, but these errors were encountered:
cryptomolot
changed the title
Suggestion for improvement of documentation of Bridge Node
Suggestion for improvement of documentation of Bridge Node [documentation]
Jun 27, 2024
Hey @cryptomolot, I don't think the soft limit of 1400000 should exceed the default system-wide limit unless you change the default. My system's default is 9223372036854775807, which is much more than we recommend for the per-process limit.
Did you change the system-wide default? If so, what for?
P.S You can check your system-wide default via cat /proc/sys/fs/file-max
Dear Celestia Foundation,
I'm Vladislav from cryptomolot and we want to provide transparency and clarify the reason for our downtime on 24/06/2024. This is the link to our node.
We are participating in MOCHA-4 testnet and applied for a delegation program.
We were facing this problem: our system with a bridge node installed crashes when LimitNOFILE=1400000 was set due to latest update.
We are using the recommended server requirements: Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-144-generic x86_64). Hardware also meets the requirements (Memory: 16 GB RAM, although we currently have 32 GB) ,CPU: 6 cores (we have 24 cores actually), Disk: 10 TB SSD Storage,Bandwidth: 1 Gbps for Download/1 Gbps for Upload)
We had to reduce this value to LimitNOFILE=65535 to ensure system stability after the latest update.
By setting the soft limit to 1400000 in the service file, we exceeded the hard limit of the system, which I believe caused the crash.
Log from the system
We don't think disks were the reason. They weren't overloaded (and the hardware in general related to our monitoring). It is likely related to the processor or the operating system. The system has the hard limit file descriptor set to 1048576. When we changed the Soft Limit in the service file, we exceeded the Hard Limit, we think that was the reason why the system crashes.
What could help: It may be worth revising this value in the documentation to a lower setting or providing more detailed system requirements. https://docs.celestia.org/nodes/systemd#celestia-bridge-node
Btw, we checked the 1390 Issue and saw that was already discussed but probably worth mentioning again after our case. Thanks a lot for your time!
Best regards,
Vladislav.
Let's keep in touch! My telegram is t.me/tommmymlt
The text was updated successfully, but these errors were encountered: