Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS credentials as environment variables not working as expected #191

Open
rabernat opened this issue Aug 1, 2024 · 3 comments
Open

AWS credentials as environment variables not working as expected #191

rabernat opened this issue Aug 1, 2024 · 3 comments

Comments

@rabernat
Copy link
Contributor

rabernat commented Aug 1, 2024

I'm trying to load private data from S3 in a fused UDF, and I want to make sure I'm doing it the "right" way.

I'm trying to follow these instructions: https://docs.fused.io/basics/utilities/#environment-variables
In one UDF, I've got this:

env_vars = """
AWS_ACCESS_KEY_ID=AK...
AWS_SECRET_ACCESS_KEY=Gt...
"""

# Path to your .env file
env_file_path = '/mnt/cache/.env'

@fused.udf
def udf(bbox=None, n=10):
    # Writing the environment variables to the .env file
    with open(env_file_path, 'w') as file:
        file.write(env_vars)

In the second UDF I've got this.

@fused.udf
def udf():
    import os

    import boto3
    from dotenv import load_dotenv

    # Load environment variable
    env_file_path = '/mnt/cache/.env'
    load_dotenv(env_file_path, override=True)
    
    # these are being set correctly
    assert os.environ['AWS_ACCESS_KEY_ID'] == 'AK...'
    assert os.environ['AWS_SECRET_ACCESS_KEY'] == 'Gt...'

    # doesn't work
    # botocore.exceptions.ClientError: An error occurred (InvalidToken) when calling the GetObject operation: The provided token is malformed or otherwise invalid.
    # s3 credentials not detected correctly from environment
    # s3 = boto3.client('s3')

    # does work if I explicitly pass the credentials
    s3 = boto3.client(
        's3',
        aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
        aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']
    )

    bucket="arraylake-earthmover-production"
    key="6462e90c27af040cabc066e8/chunks/0081af97634c03fc1c3fcd16b1f3c196558c15c096674f5a0052bf25479d0e8b.00000000000000000000000000000000"
    obj = s3.get_object(Bucket=bucket, Key=key)
    print(obj)

In most normal Python environments, boto3 will automatically get the credentials from the environment variables without having to pass them explicitly (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables). However, in the fused UDF, this is not working for some reason, and if I don't pass the credentials explicitly, I get the "The provided token is malformed or otherwise invalid" error.

This is obviously not a huge problem. The workaround--explicitly passing the credentials--is easy enough. But I thought I would open this issue to try to understand better what is going on here.

@isaacbrodsky
Copy link
Contributor

I think this is due to our default credentials somehow conflicting with credentials loaded through dotenv. Thanks for reporting that a workaround was needed!

@rabernat
Copy link
Contributor Author

rabernat commented Aug 6, 2024

What are the "default credentials". You're talking about the AWS credentials that are already associated with the environment?

FWIW, I experienced basically the same problem with our Arraylake token environment variable, which couldn't possibly be part of your default credentials.

@pgzmnk
Copy link
Contributor

pgzmnk commented Aug 20, 2024

That's correct. Fused environments have a set of credentials associated with them by default. It would indeed make sense to use different variable names to avoid conflicts.

If you share a reproduceable example of how you intended to use the Arraylake token we can take a look to ensure there's a path forward for all users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants