Skip to content

Latest commit

 

History

History
29 lines (17 loc) · 1.26 KB

README.md

File metadata and controls

29 lines (17 loc) · 1.26 KB

assumerolespark-s3

assumerolespark-s3

Usecase:

You can pass credential to assume role and read/write the file for specific bucket and use instance profile for other buckets. Example, if you want to read the logs from master account and write to test account then analysis it.

Steps: (tested on emr 5.x) 1. Build the jar and upload to s3 gradle build

2. Configure EMR to add above jar in emrfs 
 
 https://aws.amazon.com/blogs/big-data/securely-analyze-data-from-another-aws-account-with-emrfs/
 
 [{"classification":"emrfs-site", "properties":{"fs.s3.customAWSCredentialsProvider":"software.zip.s3.RoleBasedAWSCredentialProvider"}, "configurations":[]}]

3. Configure spark context with required valued 

spark.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY","") spark.sparkContext.hadoopConfiguration.set("AWS_SECRET_KEY_ID",") spark.sparkContext.hadoopConfiguration.set("AWS_SESSION_TOKEN","") spark.sparkContext.hadoopConfiguration.set("amz-assume-role-arn","") spark.sparkContext.hadoopConfiguration.set("s3_bucket_uri","")

  1. All set, you can run spark code which uses assume role for bucket uri and instance profile for others