Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket name sometimes missing in the S3 URL - causing failures for the same bucket and code that was previously successful for operations like get object and put object. #4187

Open
mjoeydba opened this issue Jul 2, 2024 · 1 comment
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. s3

Comments

@mjoeydba
Copy link

mjoeydba commented Jul 2, 2024

Describe the bug

S3 access failing for the same bucket and code that was previously successful. Debug trace shows that the URL used during failure does not include the bucket name either as host or in the path.

Success

2024-06-27 21:21:12,116 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Bucket': 'xxxx', 'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'Key': 'xxxx/xxxx.xlsx', 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True}
2024-06-27 21:21:12,116 botocore.regions [DEBUG] Endpoint provider result: https://xxxx.s3.amazonaws.com

Failure

2024-07-02 18:22:11,094 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True}
2024-07-02 18:22:11,095 botocore.regions [DEBUG] Endpoint provider result: https://s3.amazonaws.com

The URL in the getobject call is also showing same behavior which seems to cause the access denied error.

Expected Behavior

Successfully download object.

Current Behavior

Failure with Access Denied after it worked successfully for the same code.

Reproduction Steps

#Note : The issue occurrence is unpredictable.
import pandas as pd
import boto3
from io import BytesIO
from pyspark.sql.functions import upper
import logging
from botocore.config import Config
boto3.set_stream_logger('', logging.DEBUG)
#boto3.set_stream_logger('')

Initialize S3 client

s3 = boto3.client('s3')
INBOUND_S3_BUCKET = "xxxx"
INBOUND_FILE_PATH = 'xxx/xxxx.xlsx'
obj = s3.get_object(Bucket = INBOUND_S3_BUCKET, Key = INBOUND_FILE_PATH)

Possible Solution

Unknown

Additional Information/Context

No response

SDK version used

1.34.137

Environment details (OS name and version, etc.)

Linux, databricks

@mjoeydba mjoeydba added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Jul 2, 2024
@mjoeydba mjoeydba changed the title S3 access failing for the same bucket and code that was previously successful Bucket name sometimes missing in S3 operations - access failing for the same bucket and code that was previously successful Jul 2, 2024
@mjoeydba mjoeydba changed the title Bucket name sometimes missing in S3 operations - access failing for the same bucket and code that was previously successful Bucket name sometimes missing in the S3 URL - access failing for the same bucket and code that was previously successful Jul 3, 2024
@mjoeydba mjoeydba changed the title Bucket name sometimes missing in the S3 URL - access failing for the same bucket and code that was previously successful Bucket name sometimes missing in the S3 URL - causing failures for the same bucket and code that was previously successful for operations like get object and put object. Jul 3, 2024
@tim-finnigan tim-finnigan self-assigned this Jul 3, 2024
@tim-finnigan
Copy link
Contributor

Hi @mjoeydba thanks for reaching out. Here is a guide on troubleshooting Access Denied errors in S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/troubleshoot-403-errors.html

That error is likely occurring due to your settings, policies, permissions, or profile configuration. But if you'd like us to investigate this further on the SDK side, please share a complete code snippet to reproduce the issue, as well as debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. s3 p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. s3
Projects
None yet
Development

No branches or pull requests

2 participants