Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosted Engine Deployment fails when 6900/tcp is already added to Firewalld #283

Closed
tim427 opened this issue May 31, 2021 · 16 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@tim427
Copy link

tim427 commented May 31, 2021

SUMMARY

Hosted Engine Deployment fails when 6900/tcp is already added to Firewalld

COMPONENT NAME

05_add_host.yml -> Open a port on firewalld

STEPS TO REPRODUCE
  • Fresh install of Centos 8.3 Stream
  • Cockpit enabled
  • oVirt repo's added
  • GlusterFS deployed
  • Hosted Engine Deployment wizard
EXPECTED RESULTS

Successful deployment of a Hosted Engine

ACTUAL RESULTS
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "ERROR: Exception caught: org.fedoraproject.FirewallD1.Exception: ALREADY_ENABLED: '6900:tcp' already in 'public' Non-permanent operation"}
@tim427 tim427 added the bug Something isn't working label May 31, 2021
@arachmani arachmani assigned arachmani and unassigned mnecas Jun 1, 2021
parthdhanjal pushed a commit to parthdhanjal/ovirt-ansible-collection that referenced this issue Jun 1, 2021
@arachmani
Copy link
Member

Hi @tim427, can you please share your ansible version? found a similar issue - ansible/ansible#74800

@didib
Copy link
Member

didib commented Jun 1, 2021

I do not understand something: Is it that we always (or for a long time) had 6900 already open, under certain conditions, and the failure started to happen only recently because ansible firewalld module now started failing (#74800)?

If so, do we know what/who has it "already open" and under which conditions? Perhaps the correct fix is to simply not add it in hosted-engine, not to remove and then add.

That said, there is obviously a deeper issue here: HE deploy uses 6900 temporarily during deployment, for providing access to the internal engine. If something else also uses it for some other function, we risk a real conflict. Perhaps we can check if it's in use (or in firewalld), and if so, use some other port.

@arachmani
Copy link
Member

I do not understand something: Is it that we always (or for a long time) had 6900 already open, under certain conditions, and the failure started to happen only recently because ansible firewalld module now started failing (#74800)?

If so, do we know what/who has it "already open" and under which conditions? Perhaps the correct fix is to simply not add it in hosted-engine, not to remove and then add.

That said, there is obviously a deeper issue here: HE deploy uses 6900 temporarily during deployment, for providing access to the internal engine. If something else also uses it for some other function, we risk a real conflict. Perhaps we can check if it's in use (or in firewalld), and if so, use some other port.

AFAIK gluster uses this port, the failure occurs in HC deployment with (my guess) ansible = 2.9.21.

@mwperina
Copy link
Member

mwperina commented Jun 1, 2021

Are we sure this is ansible version issue? Or something has changed in firewalld in EL8.4/CS8?

@arachmani
Copy link
Member

Are we sure this is ansible version issue? Or something has changed in firewalld in EL8.4/CS8?

Not sure yet, I need to check.

@tim427
Copy link
Author

tim427 commented Jun 1, 2021

Hi @tim427, can you please share your ansible version? found a similar issue - ansible/ansible#74800

2.9.21-1.el8, fresh installation of CentOS 8 Stream

I do not understand something: Is it that we always (or for a long time) had 6900 already open, under certain conditions, and the failure started to happen only recently because ansible firewalld module now started failing (#74800)?
If so, do we know what/who has it "already open" and under which conditions? Perhaps the correct fix is to simply not add it in hosted-engine, not to remove and then add.
That said, there is obviously a deeper issue here: HE deploy uses 6900 temporarily during deployment, for providing access to the internal engine. If something else also uses it for some other function, we risk a real conflict. Perhaps we can check if it's in use (or in firewalld), and if so, use some other port.

AFAIK gluster uses this port, the failure occurs in HC deployment with (my guess) ansible = 2.9.21.

I think you're absolutely right! This only happens after the GlusterFS deployment. A fresh install of CentOS followed by a Hosted Engine setup with NFS, doesn't face this problem.

@arachmani
Copy link
Member

arachmani commented Jun 2, 2021

$ ansible-playbook a.yml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [test] **********************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [Open a port on firewalld] **************************************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "ERROR: Exception caught: org.fedoraproject.FirewallD1.Exception: ALREADY_ENABLED: '6900:tcp' already in 'public' Non-permanent operation"}

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

$ sudo firewall-cmd --zone=public --add-port=6900/tcp
Warning: ALREADY_ENABLED: '6900:tcp' already in 'public'
success
$ echo $?
0

@arachmani
Copy link
Member

$ ansible-playbook a.yml -vvv

TASK [Open a port on firewalld] **************************************************************************************************************************************************************************************************************
task path: /root/a.yml:7
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "echo /root/.ansible/tmp"&& mkdir "echo /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482" && echo ansible-tmp-1622619036.8000424-80856-143475961210482="echo /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482" ) && sleep 0'
Using module file /usr/lib/python3.6/site-packages/ansible/modules/system/firewalld.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-8078494wrl1ay/tmp07yksi50 TO /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482/AnsiballZ_firewalld.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482/ /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482/AnsiballZ_firewalld.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482/AnsiballZ_firewalld.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1622619036.8000424-80856-143475961210482/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
File "/tmp/ansible_firewalld_payload_9vtu_w43/ansible_firewalld_payload.zip/ansible/module_utils/firewalld.py", line 108, in action_handler
return action_func(*action_func_args)
File "/tmp/ansible_firewalld_payload_9vtu_w43/ansible_firewalld_payload.zip/ansible/modules/system/firewalld.py", line 388, in set_enabled_immediate
File "", line 2, in addPort
File "/usr/lib/python3.6/site-packages/slip/dbus/polkit.py", line 121, in _enable_proxy
return func(*p, **k)
File "", line 2, in addPort
File "/usr/lib/python3.6/site-packages/firewall/client.py", line 53, in handle_exceptions
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/firewall/client.py", line 3737, in addPort
return dbus_to_python(self.fw_zone.addPort(zone, port, protocol, timeout))
File "/usr/lib/python3.6/site-packages/slip/dbus/proxies.py", line 51, in call
return dbus.proxies._ProxyMethod.call(self, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/dbus/proxies.py", line 145, in call
**keywords)
File "/usr/lib64/python3.6/site-packages/dbus/connection.py", line 651, in call_blocking
message, timeout)
fatal: [localhost]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"icmp_block": null,
"icmp_block_inversion": null,
"immediate": true,
"interface": null,
"masquerade": null,
"offline": null,
"permanent": false,
"port": "6900/tcp",
"rich_rule": null,
"service": null,
"source": null,
"state": "enabled",
"timeout": 0,
"zone": null
}
},
"msg": "ERROR: Exception caught: org.fedoraproject.FirewallD1.Exception: ALREADY_ENABLED: '6900:tcp' already in 'public' Non-permanent operation"
}

@arachmani
Copy link
Member

I see the same issue after downgrading the ansible version from ansible-2.9.21 to ansible-2.9.18.

@arachmani
Copy link
Member

Seems like the issue is when trying to open a port using ansible and firewalld-0.9.3-1 version

@michalskrivanek
Copy link
Member

@Akasurde
Copy link

Akasurde commented Jun 2, 2021

I am working on a fix (the way we get information about ports from firewall) in ansible.posix repo

@Akasurde
Copy link

Akasurde commented Jun 3, 2021

@tim427 @arachmani @michalskrivanek @mwperina Could you please check if ansible-collections/ansible.posix#199 works for you and let me know? Thanks

@Akasurde
Copy link

Akasurde commented Jun 3, 2021

@parthdhanjal ^^

@arachmani
Copy link
Member

@tim427 @arachmani @michalskrivanek @mwperina Could you please check if ansible-collections/ansible.posix#199 works for you and let me know? Thanks

@Akasurde, ansible-collections/ansible.posix#199 works fine for me, thanks!

@arachmani
Copy link
Member

Closing it as this was fixed in - ansible-collections/ansible.posix#179
Gluster team will also remove port 6900/tcp on their side gluster/gluster-ansible#131
Workaround: execute firewall-cmd --zone=public --remove-port=6900/tcp before hosted-engine deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants