Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent usage of AWS endpoint URL #21784

Open
2 tasks done
hutch3232 opened this issue Mar 16, 2025 · 0 comments
Open
2 tasks done

inconsistent usage of AWS endpoint URL #21784

hutch3232 opened this issue Mar 16, 2025 · 0 comments
Assignees
Labels
A-io-cloud Area: reading/writing to cloud storage bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@hutch3232
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "boto3",
#     "polars==1.25.2",
# ]
# ///

import os
import polars as pl

pl.Config.set_verbose(True) 

# WORKS
# os.environ["AWS_PROFILE"] = "my-role"
# os.environ["AWS_ENDPOINT_URL"] = "https://my-endpoint.com/"
# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet")

# WORKS
# os.environ["AWS_ENDPOINT_URL"] = "https://my-endpoint.com/"
# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet",
#                 storage_options={
#                     "profile": "my-role",
#                 })

# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet",
#                 storage_options={
#                     "profile": "my-role",
#                     "endpoint_url": "https://my-endpoint.com/"
#                 })
# ValueError: unsupported: cannot combine aws_profile with endpoint_url in storage_options

# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet",
#                 storage_options={
#                     "profile": "my-role",
#                 })
# _init_credential_provider_builder(): credential_provider_init = CredentialProviderBuilder(CredentialProviderAWS @ AutoInitAWS)
# async thread count: 4
# [CredentialProviderBuilder]: Begin initialize CredentialProviderAWS @ AutoInitAWS
# [CredentialProviderBuilder]: Initialized <polars.io.cloud.credential_provider._providers.CredentialProviderAWS object at 0x7f69e1e06660> from CredentialProviderAWS @ AutoInitAWS
# [FetchedCredentialsCache]: Call update_func: current_time = 1742136751, last_fetched_expiry = 0
# [FetchedCredentialsCache]: Finish update_func: new expiry = (never expires)

# stalls and does not allow interrupt - had to kill terminal

# WORKS
# os.environ["AWS_ENDPOINT_URL"] = "https://my-endpoint.com/"
# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet",
#                 credential_provider=pl.CredentialProviderAWS(profile_name="my-role"))

# pl.read_parquet("s3://my-bucket/my-prefix/temp_prq/*.parquet",
#                 credential_provider=pl.CredentialProviderAWS(profile_name="my-role"))
# _init_credential_provider_builder(): credential_provider_init = CredentialProviderBuilder(<polars.io.cloud.credential_provider._providers.CredentialProviderAWS object at 0x7fc5bb46cec0> @ InitializedCredentialProvider)
# async thread count: 4
# [CredentialProviderBuilder]: Begin initialize <polars.io.cloud.credential_provider._providers.CredentialProviderAWS object at 0x7fc5bb46cec0> @ InitializedCredentialProvider
# [CredentialProviderBuilder]: Initialized <polars.io.cloud.credential_provider._providers.CredentialProviderAWS object at 0x7fc5bb46cec0> from <polars.io.cloud.credential_provider._providers.CredentialProviderAWS object at 0x7fc5bb46cec0> @ InitializedCredentialProvider
# [FetchedCredentialsCache]: Call update_func: current_time = 1741387178, last_fetched_expiry = 0
# [FetchedCredentialsCache]: Finish update_func: new expiry = (never expires)

# stalls and does not allow interrupt - had to kill terminal

I ran the above script using uv run <script> while uncommenting one block and a time.

Log output

Issue description

I am using an S3 compatible storage. I have endpoint_url defined in my ~/.aws/config under the relevant profile, so it should be able to be picked up automatically without explicit specification (or via environmental variable) - s3fs / boto3 are able to do that when specifying the relevant profile argument.

I'm also behind a firewall, so if polars is by default trying to access AWS, it won't be able to reach it (probably why it is stalling).

Expected behavior

I think the following order of precedence should apply:

  1. If endpoint_url is supplied to storage_options
  2. the environmental variable AWS_ENDPOINT_URL
  3. picked up from ~/.aws/config (like boto3)

It would be nice to not have to manually specify the endpoint URL if it can be inferred from the config file.

Installed versions

--------Version info---------
Polars:              1.25.2
Index type:          UInt32
Platform:            Linux-4.18.0-553.34.1.el8_10.x86_64-x86_64-with-glibc2.31
Python:              3.13.1 (main, Jan 14 2025, 22:47:35)
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.37.13
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@hutch3232 hutch3232 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 16, 2025
@nameexhaustion nameexhaustion self-assigned this Mar 17, 2025
@nameexhaustion nameexhaustion added P-low Priority: low A-io-cloud Area: reading/writing to cloud storage and removed needs triage Awaiting prioritization by a maintainer labels Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-cloud Area: reading/writing to cloud storage bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants