Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] Activate service account for storage and controller #4529

Merged
merged 7 commits into from
Jan 6, 2025

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jan 4, 2025

Fixes #4528 and #4512

When a service account is used locally, although the service account key is uploaded, it cannot be used by the code for cloud storage. This is because gcloud auth activate-service-account is required for service account key, which is not needed for the normal user application key. We now activate the service account anyway, to make sure the service account works.

To reproduce: #4528

  1. Set a ~/.sky/config.yaml
jobs:
  controller:
    resources:
      cpus: 2
      cloud: kubernetes
  1. Have the service account activated for GCP locally: gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIAL
  2. Run a managed job with sky jobs launch test.yaml
resources:
  cloud: kubernetes
  cpus: 2


workdir: examples

run: ls
  1. The current master will fail to download the workdir from the bucket to the job cluster due to the gsutil fail to find the credential.

To reproduce: #4512
1. Set a ~/.sky/config.yaml

jobs:
  controller:
    resources:
      cpus: 2
      cloud: kubernetes

2. Have the service account activated for GCP locally: gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIAL
3. sky jobs launch -n test --cloud gcp --cpus 2 echo hi

Actually, after testing, the controller could correctly launch clusters on GCP even without the activation of service account. It seems that the only GCP-related commands that require activating service account is the storage, so we don't have to activate the service account for controller. (Note: the kubernetes cluster is started with sky local up, so there should be no default gcp credential)

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • The reproducibility above
    • With normal user account.
  • All smoke tests: pytest tests/test_smoke.py
    • pytest tests/test_smoke.py --managed-jobs --gcp with a service account activated and controller on k8s.
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@Michaelvll
Copy link
Collaborator Author

/quicktest-core

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @Michaelvll!

@@ -114,7 +114,11 @@ class GcsCloudStorage(CloudStorage):
def _gsutil_command(self):
gsutil_alias, alias_gen = data_utils.get_gsutil_command()
return (f'{alias_gen}; GOOGLE_APPLICATION_CREDENTIALS='
f'{gcp.DEFAULT_GCP_APPLICATION_CREDENTIAL_PATH} {gsutil_alias}')
f'{gcp.DEFAULT_GCP_APPLICATION_CREDENTIAL_PATH}; '
'gcloud auth activate-service-account '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something similar need to be done for our VM provisioning pipeline on the jobs/serve controller?

For example, if the user is using service account for auth, runs a controller on k8s and wants to launch a managed job on GCP, will he run into the same issue (see #4512)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I changed the PR to fix the issue.

@Michaelvll Michaelvll changed the title [GCP] Activate service account for storage [GCP] Activate service account for storage and controller Jan 5, 2025
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the quick turnaround @Michaelvll!

@Michaelvll
Copy link
Collaborator Author

Actually, after testing, the controller could correctly launch clusters on GCP even without the activation of service account. It seems that the only GCP-related commands that require activating service account is the storage, so we don't have to activate the service account for controller. (Note: the kubernetes cluster is started with sky local up, so there should be no default gcp credential). I reverted the changes for controller.

PTAL @romilbhardwaj

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for verifying @Michaelvll!

@Michaelvll Michaelvll merged commit 38a822a into master Jan 6, 2025
19 checks passed
@Michaelvll Michaelvll deleted the activate-service-account-if-needed branch January 6, 2025 06:51
@zpoint zpoint mentioned this pull request Jan 6, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GCP] Support service account based auth for jobs/serve controller
2 participants