Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws] cache user identity by 'aws configure list' #4507

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

aylei
Copy link
Contributor

@aylei aylei commented Dec 25, 2024

Cache aws sts get-caller-identity result, indexed by hashing the output of aws configure list. This will speed up sky launch by ~1s in subsequent runs.

original discussion: #3159 (comment)

A simple benchmark (enabled clouds: [aws, gcp, azure, kubernetes, runpod, cudo, paperspace], the code is slightly modified to return before the confirmation is prompted):

# master, avg: 2.558s 
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       25,58 real        16,57 user        25,29 sys

# PR branch, avg: 1.596s
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       15,96 real        16,84 user        24,06 sys

# PR branch with https://github.com/skypilot-org/skypilot/pull/4483 picked, avg: 1.321s
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       13,21 real        16,28 user        11,18 sys

@Michaelvll Please kindly take a look

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (see above benchmark)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aylei! This PR looks good to me!

@@ -773,6 +784,38 @@ def get_user_identities(cls) -> Optional[List[List[str]]]:
# automatic switching for AWS. Currently we only support one identity.
return [user_ids]

@classmethod
@functools.lru_cache(maxsize=1) # Cache since getting identity is slow.
def get_user_identities(cls) -> Optional[List[List[str]]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a docstr for this function for the behavior of caching and returning identity. It might also be good to move the docstr from _sts_get_caller_identity to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! just moved the docstring here and added the description of caching behavior

@Michaelvll
Copy link
Collaborator

I assume the time tests listed in the PR description has the launch to return immediately after the optimizer has been run, i.e. not going to the confirmation line.

@aylei
Copy link
Contributor Author

aylei commented Jan 2, 2025

I assume the time tests listed in the PR description has the launch to return immediately after the optimizer has been run, i.e. not going to the confirmation line.

Yes, I updated the PR description to make the result clearer~

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the quick fix @aylei! Just two minor comments before we merge it. : )

Comment on lines 811 to 812
cache_path = catalog_common.get_catalog_path(
f'aws/user-identity-{config_hash}.txt')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move it to aws/.cache/user-identity-{config_hash}.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Comment on lines 761 to 766
"""Returns a [UserId, Account] list that uniquely identifies current AWS
principal (user, role or federated identity) whose credentials are used
to run current `sky` process. These identities are assumed to be stable
for the duration of the `sky` process. Modifying the credentials while
the `sky` process is running will not affect the identity returned by
this function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: keep the first line of the docstr to be single line (within 80 characters), we can have detailed descriptions below in a different paragraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Michaelvll Michaelvll merged commit 2ccbbff into skypilot-org:master Jan 3, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants