Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSN-35] Propagate Azure credentials to Fusion #5522

Closed
wants to merge 1 commit into from

Conversation

alberto-miranda
Copy link
Contributor

@alberto-miranda alberto-miranda commented Nov 19, 2024

Description

This PR ensures that sufficient authentication information is propagated between Nextflow and Fusion so that Fusion can authenticate on behalf of the user. This fixes some authentication problems with the current SAS token-based approach, which was limiting.

Warning

This depends on https://github.com/seqeralabs/fusion/pull/622 to work as expected.

Copy link

netlify bot commented Nov 19, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 498b431
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/673c8bd62013390008ce0a87

@alberto-miranda
Copy link
Contributor Author

@pditommaso I manually executed the tests with multiple containers described in #5444 and they passed, but the problem is that they rely on a fusion development snapshot. It would be nice to include those tests in the nextflow test suite, how should I go about it?

@pditommaso
Copy link
Member

Is there any change in the docs? as for testing, nextflow build can only version pinned in the config.

In which Fusion version this feature is going to be added?

@alberto-miranda
Copy link
Contributor Author

Is there any change in the docs? as for testing, nextflow build can only version pinned in the config.

No change to the docs. This just extends the auth information sent from Nextflow to Fusion.

In which Fusion version this feature is going to be added?

Unsure, but possibly for the next bug fix release. Care to chime in @jordeu?

@alberto-miranda alberto-miranda changed the title Propagate Azure credentials to Fusion [FSN-35] Propagate Azure credentials to Fusion Nov 19, 2024
@jordeu
Copy link
Collaborator

jordeu commented Nov 19, 2024

Yes, we can add this to the next v2.4.7 release (that we can release next week)

@pditommaso
Copy link
Member

Being a new feature should not go into a patch release

@alberto-miranda
Copy link
Contributor Author

Being a new feature should not go into a patch release

I agree in principle, but this is a feature that was already released in Nextflow (though not officially in Fusion). I'm not sure what's the best way to manage this.

Comment on lines +61 to +67
if (cfg.managedIdentity().isConfigured()) {
// User-assigned Managed Identity
if (cfg.managedIdentity().clientId) {
result.AZURE_CLIENT_ID = cfg.managedIdentity().clientId
}
// System Managed Identity
return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item aws.batch.jobRole = 'JOB ROLE ARN' to specify this variable, perhaps we should do the same here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)

This approach would work for a system-wide managed identity, but AFAIK azcopy requires the AZURE_CLIENT_ID environment variable for a user-assigned managed identity (or the AZURE_CLIENT_ID, AZURE_SECRET_ID, and AZURE_TENANT_ID for a service principal).

I'm sure I'm missing something, but since Fusion support for this (wip) uses the same underlying mechanism (DefaultAzureCredential), it requires the same environment variables, so I'm not sure what we gain here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item aws.batch.jobRole = 'JOB ROLE ARN' to specify this variable, perhaps we should do the same here?

No idea about this, sorry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #5232

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #5232

Ok, I understand what you mean now regarding the head vs worker nodes MIs. I think it makes sense, but supporting this would require changes to the configuration file format, which is probably out of the scope of this PR? Or we could discuss whether we want to do this here, but it was not the original purpose of the PR 🤔

Comment on lines +97 to +100
env.AZURE_TENANT_ID == TENANT_ID
env.AZURE_CLIENT_ID == CLIENT_ID
env.AZURE_CLIENT_SECRET == CLIENT_SECRET
env.AZURE_STORAGE_SAS_TOKEN == null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't passing the creds to the worker nodes something we should avoid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern, but I don't know enough about Nextflow to answer it (cc @pditommaso @jordeu). If this is indeed the case, the only way forward is to generate multiple SAS tokens (one for each container accessed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the benefits of using SAS token over secrets, and I would add Azure security is totally crap.

What do you think @arnaualcazar, it's safe exposing the client secret as an env variable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing secrets through environment variables should be avoided whenever possible. But what kind of credentials are we passing? If these credentials come from Managed Identities, then it is less of a problem since these are only valid for some time.

Instead of a SAS token, can we use a Managed Identity assigned to the container where Fusion is?

Copy link
Collaborator

@adamrtalbot adamrtalbot Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a managed identity but we would need to implement #5232 which is a much larger job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially you are saying that if the batch node pool is aware of the managed identity there's no need for propagate the secret. Think that's the way to go

Copy link
Collaborator

@arnaualcazar arnaualcazar Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly. This is the same we do on AWS afaik.

Copy link
Collaborator

@adamrtalbot adamrtalbot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue remains - how to attach a managed identity to the worker machines. Currently, we don't implement this at all but we need to do it to make this work. I think we can do it with this: https://learn.microsoft.com/en-en/java/api/com.microsoft.azure.batch.protocol.models.userassignedidentity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we rely on the user to attach a managed identity to each Azure Batch node pool.

@alberto-miranda
Copy link
Contributor Author

I looked into writing some validation tests for this but since the failing behavior only happens with Fusion, I feel it would be far simpler to test it in the Fusion CI.

@alberto-miranda
Copy link
Contributor Author

Ready for review.

@alberto-miranda
Copy link
Contributor Author

After internally discussing it, it makes more sense to pursue a solution where we don't need to propagate secrets (or SAS) to the containers. As such, I'm closing this PR in favor of solving #5232.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants