-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move SSH key generation script from pam.d to /etc/profile (3.x) #1545
base: develop
Are you sure you want to change the base?
Conversation
When FSx Lustre is configured with the new root_squash feature, and ParallelCluster is configured with Active Directory with home folders within the FSx mount, pam_exec.so is unable to properly run the SSH key generation script. This is because pam_exec.so runs the script as root, but root does not have access to any home folders to manipulate the files due to the fact that root is regarded as nobody/nogroup within the root_squash'd FSx mount point. Using su in the generation script to impersonate the user does not work around the problem, as su itself would trigger pam_exec.so, and trigger a loop, which doesn't look trivial to avoid to me. Instead, I suggest moving the key generation to /etc/profile, which is executed by default for every interactive shells, by the connecting user, and serves the purpose.
554f8df
to
20cc762
Compare
Hi @QuentinM-Hilbtec thanks for the contribution. You brought up a valid point and proposed an interesting solution. My concern is that by moving the logic to the profile script you only trigger it when opening a login shell. What if we keep the trigger at the pam level and perform a switch user before creating the necessary directories and files? Would it work? |
Dear @demartinofra, Thanks for your review.
Your point makes sense, I expected it might be a requirement - we could always move it to
I imagined doing that initially, and tried implementing this solution for about an hour before giving up. Switching from root to |
Thanks @QuentinM-Hilbtec for the additional info. We are looking into what it takes to support this scenario without breaking the use cases that rely on key being generated as part of the pam routine. To get you unblocked in the short term my suggestion (if you haven't done it already) is to use an OnNodeConfigured custom script to disable the pam action and move it into the shell profile. |
Description of changes
When FSx Lustre is configured with the new root_squash feature, and ParallelCluster is configured with Active Directory with home folders within the FSx mount, pam_exec.so is unable to properly run the SSH key generation script. This is because pam_exec.so runs the script as root, but root does not have access to any home folders to manipulate the files due to the fact that root is regarded as nobody/nogroup within the root_squash'd FSx mount point.
Using su in the generation script to impersonate the user does not work around the problem, as su itself would trigger pam_exec.so, and trigger a loop, which doesn't look trivial to avoid to me.
Instead, I suggest moving the key generation to /etc/profile, which is executed by default for every interactive shells, by the connecting user, and serves the purpose.
Tests
I have performed the Parallel Cluster initialization & successfully logged in with an AD user, with its SSH key material properly generated upon login.
References
Checklist
It is my first interaction with this repository, and my first few days with ParallelCluster as a whole - testing has been a bit of a journey between building/uploading a new pcluster, node tool, image, and cookbooks - been bumping into the version checks in various places for quite a while as the versions feeding into the checks appear to be coming from different places, between the AMI baked "bootstrap" version, the userdata generated by pcluster,
b1
not being tolerated by Berks, ... But at last, the solution works for my cluster - and actually initially wrote the change purely in Ansible, but I was bumping another provisioning issue breaking Cloudformation's initial rollout (seeOther issue
below) so decided to start editing the cookbook anyways.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Other issue
With Parallel Cluster v3.2, root_squash was also broken during provisioning due to something else (see case / see logs below), but that issue seems to have been already resolved, although I am not 100% sure why by just glancing over the code.