assumerolespark-s3

Usecase:

You can pass credential to assume role and read/write the file for specific bucket and use instance profile for other buckets. Example, if you want to read the logs from master account and write to test account then analysis it.

Steps: (tested on emr 5.x) 1. Build the jar and upload to s3 gradle build

2. Configure EMR to add above jar in emrfs 
 
 https://aws.amazon.com/blogs/big-data/securely-analyze-data-from-another-aws-account-with-emrfs/
 
 [{"classification":"emrfs-site", "properties":{"fs.s3.customAWSCredentialsProvider":"software.zip.s3.RoleBasedAWSCredentialProvider"}, "configurations":[]}]

3. Configure spark context with required valued

spark.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY","") spark.sparkContext.hadoopConfiguration.set("AWS_SECRET_KEY_ID",") spark.sparkContext.hadoopConfiguration.set("AWS_SESSION_TOKEN","") spark.sparkContext.hadoopConfiguration.set("amz-assume-role-arn","") spark.sparkContext.hadoopConfiguration.set("s3_bucket_uri","")

All set, you can run spark code which uses assume role for bucket uri and instance profile for others

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

assumerolespark-s3

Files

README.md

Latest commit

History

README.md

File metadata and controls

assumerolespark-s3