-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSN-35] Propagate Azure credentials to Fusion #5522
Conversation
Signed-off-by: Alberto Miranda <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
@pditommaso I manually executed the tests with multiple containers described in #5444 and they passed, but the problem is that they rely on a fusion development snapshot. It would be nice to include those tests in the nextflow test suite, how should I go about it? |
Is there any change in the docs? as for testing, nextflow build can only version pinned in the config. In which Fusion version this feature is going to be added? |
No change to the docs. This just extends the auth information sent from Nextflow to Fusion.
Unsure, but possibly for the next bug fix release. Care to chime in @jordeu? |
Yes, we can add this to the next v2.4.7 release (that we can release next week) |
Being a new feature should not go into a patch release |
I agree in principle, but this is a feature that was already released in Nextflow (though not officially in Fusion). I'm not sure what's the best way to manage this. |
if (cfg.managedIdentity().isConfigured()) { | ||
// User-assigned Managed Identity | ||
if (cfg.managedIdentity().clientId) { | ||
result.AZURE_CLIENT_ID = cfg.managedIdentity().clientId | ||
} | ||
// System Managed Identity | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item aws.batch.jobRole = 'JOB ROLE ARN'
to specify this variable, perhaps we should do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)
This approach would work for a system-wide managed identity, but AFAIK azcopy
requires the AZURE_CLIENT_ID
environment variable for a user-assigned managed identity (or the AZURE_CLIENT_ID
, AZURE_SECRET_ID
, and AZURE_TENANT_ID
for a service principal).
I'm sure I'm missing something, but since Fusion support for this (wip) uses the same underlying mechanism (DefaultAzureCredential
), it requires the same environment variables, so I'm not sure what we gain here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item
aws.batch.jobRole = 'JOB ROLE ARN'
to specify this variable, perhaps we should do the same here?
No idea about this, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the issue: #5232
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the issue: #5232
Ok, I understand what you mean now regarding the head vs worker nodes MIs. I think it makes sense, but supporting this would require changes to the configuration file format, which is probably out of the scope of this PR? Or we could discuss whether we want to do this here, but it was not the original purpose of the PR 🤔
env.AZURE_TENANT_ID == TENANT_ID | ||
env.AZURE_CLIENT_ID == CLIENT_ID | ||
env.AZURE_CLIENT_SECRET == CLIENT_SECRET | ||
env.AZURE_STORAGE_SAS_TOKEN == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't passing the creds to the worker nodes something we should avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid concern, but I don't know enough about Nextflow to answer it (cc @pditommaso @jordeu). If this is indeed the case, the only way forward is to generate multiple SAS tokens (one for each container accessed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the benefits of using SAS token over secrets, and I would add Azure security is totally crap.
What do you think @arnaualcazar, it's safe exposing the client secret as an env variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing secrets through environment variables should be avoided whenever possible. But what kind of credentials are we passing? If these credentials come from Managed Identities, then it is less of a problem since these are only valid for some time.
Instead of a SAS token, can we use a Managed Identity assigned to the container where Fusion is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a managed identity but we would need to implement #5232 which is a much larger job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially you are saying that if the batch node pool is aware of the managed identity there's no need for propagate the secret. Think that's the way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly. This is the same we do on AWS afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue remains - how to attach a managed identity to the worker machines. Currently, we don't implement this at all but we need to do it to make this work. I think we can do it with this: https://learn.microsoft.com/en-en/java/api/com.microsoft.azure.batch.protocol.models.userassignedidentity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we rely on the user to attach a managed identity to each Azure Batch node pool.
I looked into writing some validation tests for this but since the failing behavior only happens with Fusion, I feel it would be far simpler to test it in the Fusion CI. |
Ready for review. |
After internally discussing it, it makes more sense to pursue a solution where we don't need to propagate secrets (or SAS) to the containers. As such, I'm closing this PR in favor of solving #5232. |
Description
This PR ensures that sufficient authentication information is propagated between Nextflow and Fusion so that Fusion can authenticate on behalf of the user. This fixes some authentication problems with the current SAS token-based approach, which was limiting.
Warning
This depends on https://github.com/seqeralabs/fusion/pull/622 to work as expected.