You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Installing firewalld on a system that is using cloud-init, which CyberPanel does by default, causes a race condition that, around 75% of the time for me, resulted in an instance with a broken network stack that I could not SSH into.
Since I thought it was a problem with my custom netplan, I kept spinning up new instances with different init scripts trying to disable the netplan... and eventually got in! ... only to realize that my init scripts were never actually running, and netplan had never been changed. It wasn't netplan being problematic, it was something else. Further investigation lead me to the following with journalctl:
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found ordering cycle on firewalld.service/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on basic.target/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on sockets.target/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on apport-forward.socket/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on sysinit.target/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on cloud-init.service/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on systemd-networkd-wait-online.service/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on systemd-networkd.service/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Found dependency on network-pre.target/start
Sep 05 07:37:40 X systemd[1]: network-pre.target: Job firewalld.service/start deleted to break ordering cycle starting with network-pre.target/start
It was different services being "deleted to break [the] ordering cycle" on different snapshots, such as cloud-init, dbus (which I'm pretty sure caused this for this poor fellow), etc. I kept restarting various snapshots to see when next they'd let me in. About 25% of the time they did.
This strange behavior and weird error logs eventually lead me to this forum post from 2019. Which lead me to firewalld/firewalld#414. Quite the three day time sink debugging adventure.
So: CP either shouldn't install firewalld on systems where cloud-init is present, or CP should delete the firewalld.service file and supply its own (e.g. patch firewalld to run later). AWS already has an instance-level firewall, so firewalld isn't useful. And fail2ban already works with iptables by default.
For now, uninstalling firewalld will do.
Enhancement 1: instead of firewalld as the backend, just use AWS CLI/API firewall controls and let Amazon deal with running the firewall.
Enhancement 2: CP installer must ask if firewalld should be installed, and must default to "no" on systems with cloud-init installed. Warnings should be given about installing firewalld on a cloud-init system (like AWS VPSes) and how it can break the network stack (so take a snapshot before continuing!)
The text was updated successfully, but these errors were encountered:
Holy shit what the fuck is this?Installing firewalld on a system that is using cloud-init, which CyberPanel does by default, causes a race condition that, around 75% of the time for me, resulted in an instance with a broken network stack that I could not SSH into.
Since I thought it was a problem with my custom netplan, I kept spinning up new instances with different init scripts trying to disable the netplan... and eventually got in! ... only to realize that my init scripts were never actually running, and netplan had never been changed. It wasn't netplan being problematic, it was something else. Further investigation lead me to the following with journalctl:
It was different services being "deleted to break [the] ordering cycle" on different snapshots, such as cloud-init, dbus (which I'm pretty sure caused this for this poor fellow), etc. I kept restarting various snapshots to see when next they'd let me in. About 25% of the time they did.
This strange behavior and weird error logs eventually lead me to this forum post from 2019. Which lead me to firewalld/firewalld#414. Quite the
three day time sinkdebugging adventure.So: CP either shouldn't install firewalld on systems where cloud-init is present, or CP should delete the firewalld.service file and supply its own (e.g. patch firewalld to run later). AWS already has an instance-level firewall, so firewalld isn't useful. And fail2ban already works with iptables by default.
For now, uninstalling firewalld will do.
Enhancement 1: instead of firewalld as the backend, just use AWS CLI/API firewall controls and let Amazon deal with running the firewall.
Enhancement 2: CP installer must ask if firewalld should be installed, and must default to "no" on systems with cloud-init installed. Warnings should be given about installing firewalld on a cloud-init system (like AWS VPSes) and how it can break the network stack (so take a snapshot before continuing!)
The text was updated successfully, but these errors were encountered: