Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locate_credentials() is slow #33

Open
leeper opened this issue Jul 28, 2018 · 7 comments
Open

locate_credentials() is slow #33

leeper opened this issue Jul 28, 2018 · 7 comments

Comments

@leeper
Copy link
Member

leeper commented Jul 28, 2018

Example from cloudyr/aws.s3#166

Report from Vitalina Komashko:

system.time(aws.s3::copy_object(from_object = from_object[1], to_object = to_object[1],
from_bucket = from_bucket, to_bucket = to_bucket, verbose = TRUE))
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key

Checking bucket region using get_location('<REDACTED>')
Executing request using bucket region us-west-1
S3 Request URL: https://s3-us-west-1.amazonaws.com/<REDACTED>
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK
    user   system  elapsed
   0.836    0.674 2437.777

This runs about 10x slower than the AWS CLI. Issue may be with locate_credentials():

/usr/bin/time -l aws s3 cp s3://<REDACTED> --profile corporate
copy: s3://<REDACTED> to s3://<REDACTED>

      263.67 real         2.24 user         0.65 sys
@leeper
Copy link
Member Author

leeper commented Aug 7, 2018

Ideas for refactoring:

  • convert locate_credentials() into a set of smaller functions called sequentially if previous function returns NULL
  • locate_credentials() is called twice in a lot of packages, once explicitly and once implicitly. Disabling second call (by adding a locate = FALSE to signature_v4_auth() will prevent the second search
  • add "requests" functionality that builds GET(), POST(), etc calls and signs them automatically.

@leeper
Copy link
Member Author

leeper commented Aug 7, 2018

We also get a decent improvement on signature_v4_auth() from v0.4.4:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo")))
+ )
# A tibble: 1 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 15.1ms 23.9ms 25.5ms 35.6ms      41.8    35.1KB     1    20      478ms <lgl ~ <Rpro~ <bch~ <tib~

to v0.4.5:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"))),
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"), force_credentials = TRUE))
+ )
# A tibble: 2 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 2.52ms 2.85ms 2.79ms 3.85ms      350.    1.38MB     9   151      431ms <lgl ~ <Rpro~ <bch~ <tib~
2 "is.list(~ 2.48ms 2.84ms 2.73ms 8.25ms      352.   19.88KB     9   149      423ms <lgl ~ <Rpro~ <bch~ <tib~

@colman-humphrey
Copy link

I had an issue with very slow loads, permeating even calling library(aws.s3), since that would load aws.signature. On my end, the issue, was that check_ec2() is the bottleneck, since it makes a URL call-out that I guess gets retried a bunch of times? And this is called in .onLoad. Either way, it was taking a long time. I'm not sure if other people have this issue, as I only got it as long as aws.ec2metadata was installed on a non-ec2 instances (for testing), which was of course a terrible idea.

However, removing aws.ec2metadata didn't solve all my issues. I'll make a new issue hopefully outlining what I think is happening at least in my case.

@micwalk
Copy link

micwalk commented May 3, 2019

I opened a ticket within the aws.ec2metadata repo here for the slow is_ec2() call: cloudyr/aws.ec2metadata#7

@jon-mago
Copy link
Contributor

Hopefully with the 0.2.0 release of aws.ec2metadata this should be improved. The timeout for checking metadata has been decreased from the curl default of 10s to 1s (to bring it inline with boto3).

@jyoungs
Copy link

jyoungs commented Jul 31, 2019

Hey @jon-mago - that timeout will help. What would be better (for us) is if aws.signature followed the typical credential precedence structure (check .aws/credentials before Instance/Container IAM Roles).

Seems pretty consistent:

Thoughts on why CloudyR is different, and maybe offerening the regular pattern as an option?

@jon-mago
Copy link
Contributor

jon-mago commented Aug 1, 2019

I'm not sure why it's different. I suspect the answer is merely "because it was written that way quite a while ago".

While it would be a kind of breaking change to "put the order right", I'm not averse to that, as I suspect it will break few things in practice. I will have a think on how to offer the standard order as an option, so it can be a more gradual transition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants